fix(spec-intake): treat commit/push/open-PR as the normal outcome, not a risky side effect by kjgbot · Pull Request #135 · AgentWorkforce/ricky

kjgbot · 2026-05-27T21:49:40Z

Problem

The riskySideEffectQuestion clarification gate (src/product/spec-intake/clarifications.ts) flagged any spec mentioning commit / push / open pr / create pr and returned a blocking side-effect-approval question unless the spec contained an approval guard word (pause|ask|confirm|approval|...).

But opening a pull request is Ricky's normal, always-intended outcome. Requiring spec authors to add approval boilerplate just to get a PR is backwards. In practice it also meant ricky --mode local --spec-file X --run --best-judgement would not fail fast on the gate — it re-spawned the persona-writer repeatedly (a ~5-min-per-attempt generation-retry loop) instead of producing the artifact and opening the PR.

Fix

Only genuinely destructive / irreversible verbs now require an explicit approval boundary:

deletes? | removes? | drops? | destroys? | resets? | migrates? | deploys? | publishes? | merges? | force-push

commit / push / open-PR / create-PR are dropped from the risky pattern. force-push is added so the destructive variant of push is still gated.

Updated defaultAssumption to state PR creation is the allowed normal outcome.
Existing destructive-spec test (deletes obsolete files…) still asserts the gate fires.
New test asserts a commit/push/open-PR-only spec raises no side-effect-approval question.

Verification

npx vitest run src/product/spec-intake/parser.test.ts → 39/39 pass.
npm run typecheck clean.
npm run build clean.

🤖 Generated with Claude Code

…t a risky side effect The risky-side-effect clarification gate flagged any spec mentioning commit/push/open-PR and blocked generation with a `side-effect-approval` question unless the spec contained an approval guard word. But opening a pull request is Ricky's normal, always-intended outcome — requiring spec authors to add approval boilerplate just to get a PR is backwards, and it silently sent `--run --best-judgement` batches into a generation-retry loop. Now only genuinely destructive/irreversible verbs (delete, drop, reset, migrate, deploy, publish, merge, force-push) require an explicit approval boundary. commit/push/create-PR no longer trip the gate. - Drop commits/push/open-pr/create-pr from the risky pattern; add force-push. - Update defaultAssumption to state PR creation is the allowed normal outcome. - Add a test asserting a commit/push/open-PR spec raises no side-effect question. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-05-27T21:49:44Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-05-27T21:50:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 442d06e9-d8c9-4b44-b1ab-78af1e579def

📥 Commits

Reviewing files that changed from the base of the PR and between 0308319 and d538038.

📒 Files selected for processing (2)

src/product/spec-intake/clarifications.ts
src/product/spec-intake/parser.test.ts

📝 Walkthrough

Walkthrough

The PR narrows the definition of "risky side effects" so that committing, pushing, and opening pull requests are no longer treated as actions requiring explicit approval boundaries. The risk-detection regex is trimmed to exclude these normal workflow verbs, and test coverage is updated to verify only destructive actions (deletes, drops, migrations, deploys, etc.) trigger approval gates.

Changes

Side-Effect Approval Gating

Layer / File(s)	Summary
Risk detection logic update `src/product/spec-intake/clarifications.ts`	The `riskySideEffectQuestion` regex is narrowed to exclude commit/push/PR-opening verbs; the `defaultAssumption` text states that PR creation is the normal outcome and pausing/approval is only needed before destructive operations.
Test validation for new behavior `src/product/spec-intake/parser.test.ts`	Test title renamed to reference "destructive workflows"; new test case asserts that workflows whose only side effects are commit/push/open-PR do not trigger the `side-effect-approval` clarification question.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

AgentWorkforce/ricky#61: Both PRs modify spec-intake clarification logic around "risky side effects" (the retrieved PR adds the side-effect approval clarification gate, while this PR narrows the risky-side-effect regex to stop treating commit/push/open-PR as risky).

Poem

🐰 Commit and push, no fence in sight,
PR's now flow through day and night,
Risky only when you truly break,
Deletes and drops—now pause to take!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly summarizes the main change: treating commit/push/open-PR as normal outcomes rather than risky side effects.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the problem, fix, and verification of the side-effect gating behavior.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/pr-side-effects-not-risky

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

No issues found across 2 files

_{Re-trigger cubic}

github-actions · 2026-05-27T22:32:10Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-27T21-50-18-989Z-HEAD-provider
Mode: provider
Git SHA: 16374fc

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope**  
- Update the **Ricky** repository README and module doc‑strings to state that Ricky now leverages the **Agent Assistant** (AA) for deeper orchestration, validation, and self‑repair.  
- Add a thin **AA integration shim** (`ricky/aa_integration.py`) that wraps the existing `WorkflowEngine` with AA‑provided “intent‑extraction” and “deterministic‑gate” utilities.  
- Adjust the CI pipeline to run the new AA‑based verification gate in the `verify` stage.  
- Provide a PR template and a review checklist that capture the new AA contract.

All changes are confined to a single feature branch `feat/aa‑deep‑integration` and a single PR; no other modules are touched.

---

## 1. Concrete Workflow Contract

| Phase | Responsibility | Input | Output | Deterministic Gate |
|-------|----------------|-------|--------|--------------------|
| **Intent Extraction** | `AAIntentExtractor` (AA) | Raw user request (string) | Structured `Intent` object (`action`, `scope`, `constraints`) | `AAIntentValidator` – must return `valid=True` |
| **Plan Generation** | `RickyPlanner` (Ricky) | `Intent` | `Plan` (ordered list of `TaskSpec`) | `PlanSanityCheck` – no cycles, all tasks reachable |
| **Exec
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Evaluation (Q1 2026)**  

| Metric | Current State | Target (Q3 2026) | Gap | Action |
|--------|---------------|------------------|-----|--------|
| **Repository Coverage** | 68 % of active agents import `@agent‑assistant/turn‑context` (45/66 agents) | 90 % | 22 % | • Identify the 21 agents missing the import.<br>• Open a scoped branch `adopt/turn‑context‑coverage` and create a PR template that adds the import and updates the `turnContext` usage stub. |
| **Turn‑Context Utilization** | Average of 1.2 turn‑context calls per agent (mostly for logging) | ≥ 3 calls per agent (state‑propagation, error‑context, user‑feedback) | 1.8 calls | • Add a deterministic verification gate in the CI pipeline: `npm run lint:turn‑context && npm test:turn‑context`. <br>• Provide a reusable utility `withTurnContext(fn)` that wraps agent handlers; require its use in the gate. |
| **Unit‑Test Coverage** | 54 % of turn‑context‑related code covered (jest) | 85 % | 31 % | • Introduce a test‑generation script (`npm run gen:turn‑context-tests`) that creates 5 baseline scenarios per agent (success, validation error, external‑API failure, timeout, unknown‑state). 
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow generation** | Agent Assistant already owns the *authoring contract* (input schema → workflow DSL). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for DSL syntax, enabling deterministic verification gates. |
| **Workflow validation** | Validation is the deterministic gate that follows generation. Embedding it in the same service ensures the contract is enforced immediately, before any downstream execution. | Reduces false‑positive passes, tightens the 80‑100 validation loop, and provides instant feedback to the author. |
| **Debugging (runtime & static)** | Debug utilities (trace‑injection, sandbox execution) require intimate knowledge of the DSL and its execution model—knowledge
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: Q3 2024 (July 1 – Sept 30)*  

---

## 1. Scope & Objectives  

| Goal | Success Metric |
|------|----------------|
| Deploy the **Ricky v2.0** workflow engine to **3 pilot teams** (Product Ops, Customer Success, Platform Engineering). | ≥ 90 % of pilot tickets processed via Ricky without manual fallback. |
| Establish **deterministic verification gates** for every new workflow contract. | 0 critical defects in production after 2 weeks of live traffic. |
| Capture **review artifacts** and **validation loops** to feed the next adoption slice. | ≥ 80 % of identified edge‑cases covered in automated tests. |
| Surface **blockers** early and report them in the “Adoption‑Blocker Dashboard”. | All blockers resolved within 5 business days or escalated. |

---

## 2. Deliverables & Branching Plan  

| Artifact | Repo/Branch | PR Owner | Reviewers (≥2) | Merge Gate |
|----------|-------------|----------|----------------|------------|
| **Workflow Contract Library** (`contracts/`) | `feature/ricky-v2-contracts` | Lead Workflow Engineer | QA Lead, Product Owner | All contract tests pass (unit + contract‑spec). |
| **D
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, production‑ready package definition** that lets Ricky expose its workflow‑reliability, coordination, and authoring capabilities through two front‑ends:

| Front‑end | Package name | Entry point | Primary intent |
|-----------|--------------|-------------|----------------|
| **Slack** | `ricky-slack‑agent` | `src/slack/index.ts` | Interactive slash‑commands & message actions that trigger Ricky’s workflow services. |
| **Web**   | `ricky-web‑assistant` | `src/web/index.tsx` | Embedded React widget (modal + chat pane) for on‑demand workflow creation, validation, and repair. |

Both packages share a **core contract library** (`ricky‑core‑contract`) that defines the deterministic API surface Ricky expects from any consumer.

---

## 1. Core Contract (`ricky-core-contract`)

```ts
// src/contracts/v1.ts
export type WorkflowID = string & { __brand: 'WorkflowID' };
export type BranchName = string & { __brand: 'BranchName' };
export type PRNumber   = number & { __brand: 'PRNumber' };

export interface RickyRequest {
  /** Unique workflow identifier */
  workflowId: WorkflowID;
  /** Human‑readable desc
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step	Action	Result
1️⃣	Parse arguments – Ricky reads `--mode local` and the free‑form `--spec` string.	The request is interpreted as “create a workflow definition for package checks” in local‑execution mode.
2️⃣	Generate workflow contract – Ricky builds a deterministic workflow contract (YAML/JSON) that follows the Ricky repository conventions (see AGENTS.md). The contract includes: • Inputs – e.g., list of packages, optional linting flags. • Stages – `checkout`, `install-deps`, `run-tests`, `run-lint`, `publish‑results`. • Outputs – test report, lint summary, overall status. • Verification gates – each stage returns a success/failure flag that downstream stages can gate on.	A workflow specification file (e.g., `package-checks.workflow.yaml`) is written to the current directory (or to `./ricky/workflows/` if that folder exists).
3️⃣	Skip execution – Because `--run` is omitted, Ricky does not launch any agents or containers. It only p
...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the Ricky CLI. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, honest blocker reporting).

---

## 1️⃣ Repository Setup (one‑time)

```bash
# Clone the Ricky repo (main branch)
git clone https://github.com/AgentWorkforce/ricky.git
cd ricky

# Create a scoped onboarding branch
git checkout -b onboarding/first‑run‑<your‑username>

All subsequent changes live on this branch; a PR will be opened at the end of onboarding.

2️⃣ Environment Bootstrap

# 2.1 Install the Ricky CLI (via pipx for isolation)
pipx install ricky-cli

# 2.2 Verify installation
ricky --version
# Expected output: ricky-cli v0.9.3 (or later)

# 2.3 Initialise a local config directory
ricky config init

Verification gate: ricky config list must show a non‑empty JSON file at ~/.ricky/config.json. If the file is missing or empty, abort and file a blocker issue (see Section 7).

3️⃣ Authentication & I

...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Missing or empty specification – next steps**

1. **Acknowledge the gap**  
   *“I see that the specification file is missing/empty. I can’t generate a reliable workflow without concrete requirements.”*

2. **Request the required artifacts**  
   - **Functional description** – a short paragraph or bullet list of the intended behavior.  
   - **Inputs & outputs** – data schema (JSON, CSV, etc.) for each entry point and the expected result format.  
   - **Non‑functional constraints** – latency, throughput, security, compliance, or resource limits.  
   - **Success criteria** – measurable acceptance tests (e.g., “process 10 k records < 2 s”, “no data loss > 99.99%”).  

3. **Provide a minimal template** the user can fill in:

   ```yaml
   # workflow-spec.yaml
   name: <Descriptive workflow name>
   description: |
     <One‑sentence purpose of the workflow>

   inputs:
     - name: <input1>
       type: <json|csv|avro|...>
       schema: <link or inline JSON schema>
     - name: <input2>
       ...

   outputs:
     - name: <output1>
       type: <json|csv|...>
       schema: <link or inline JSON schema>

   non_functional:
     latency: "<max latency, e.g., 2s>"
     throughput: "
...[truncated]

Must

Return a user-facing failure or guidance message with a real recovery command.
Distinguish generation failure from execution failure.
Show stack traces only when verbose diagnostic mode is requested.

Must Not

Crash with an uncaught exception in normal mode.
Suggest commands that do not exist.
Pretend a missing spec was accepted.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**

=== AgentWorkforce Status Report ===

🟡 Provider Connections
• Google : NOT CONNECTED (no successful health check)
• GitHub : NOT CONNECTED (no successful health check)

✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Storage : RUNNING

⚙️ Configuration
• Active Branches : main, develop
• Last Deploy : 2026‑05‑27 09:12 UTC
• Feature Flags : all defaults

🔍 Recent Checks
• google_provider.check – FAILED (timeout / auth error)
• github_provider.check – FAILED (no token / unreachable)

🛑 Blockers
• Google integration unavailable – actions requiring Google APIs will be skipped.
• GitHub integration unavailable – repository sync, PR automation, and webhook handling are disabled.

📌 Next Steps

Verify credentials / OAuth tokens for Google and GitHub.
Ensure network access to accounts.google.com and api.github.com.
Re‑run ricky provider check google and ricky provider check github.
Once checks pass, re‑issue ricky status to confirm CONNECTED state.

--- End of Report ---

Must

Report unknown or not-connected provider state honestly.
Update provider status only from explicit provider checks or Cloud status results.
Give concrete setup guidance for Cloud when relevant.

Must Not

Mark Google or GitHub connected because guidance text was shown.
Invent a provider connection URL or OAuth flow.
Show empty fields with no recovery guidance when config is missing.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a small, self‑contained change to a repository (e.g., typo fix, comment update, config tweak) while guaranteeing reproducibility, traceability, and zero‑regression risk.

---

## 1. Scope & Branching Contract  

| Item | Definition |
|------|------------|
| **Change Type** | Primitive (single‑file, < 50 lines, no API impact) |
| **Target Branch** | `main` (or protected default) |
| **Feature Branch** | `fix/<short‑description>-<YYYYMMDD>` |
| **PR Title Template** | `fix: <short description> (#{issue‑id})` |
| **PR Description Template** | <ul><li>What</li><li>Why (linked issue)</li><li>How (diff summary)</li><li>Verification steps</li></ul> |
| **Merge Strategy** | **Squash‑and‑merge** – keep linear history |
| **Release Gate** | Must pass all CI checks and manual review before merge. |

---

## 2. Step‑by‑Step Workflow Contract  

| # | Phase | Action | Owner | Artifact | Verification Gate |
|---|-------|--------|-------|----------|--------------------|
| 1 | **Preparation** | Clone repo, create feature branch `fix/<desc>-<date>` | Engineer | Local repo copy | `git status` clean |
| 2 | **Implementation** | Edit file(s); 
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Owner** | `team‑mlops` |
| **Purpose** | Automate a deterministic pipeline where Claude performs a textual/code review of a design spec, then Codex generates the implementation. |
| **Scope** | • Input: `spec.md` (Markdown) <br>• Output: `src/` (generated source files) <br>• Artifacts: review report, implementation diff, test suite results |
| **Branch / PR Boundary** | All changes are made on a feature branch `feat/review‑implement‑<ticket-id>` and merged via a single PR that must pass all verification gates before merge. |

---

## 1. Contract (Inputs / Outputs)

| **Artifact** | **Type** | **Schema** |
|--------------|----------|------------|
| `spec.md` | Markdown | Must contain a top‑level heading `# Specification` and a `## Acceptance Criteria` list. |
| `review.json` | JSON | `{ "status": "PASS|FAIL", "issues": [{ "line": int, "msg": string }], "summary": string }` |
| `implementation.zip` | ZIP | Contains the generated `src/` directory. |
| `test-report.xml` | JUnit XML | Produced by the test runner. |

*All artifacts are
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. The goal is to make each step **action‑oriented, measurable, and verifiable** while keeping the original logical flow intact.

Below is the end‑to‑end contract you should follow when invoking the tool, plus a concrete example that you can copy‑paste.

---

### 1. Contract Overview  

| Input | Required format | Key fields |
|-------|----------------|------------|
| `workflow` | JSON or YAML | `steps[].task.id`, `steps[].task.description`, `steps[].task.acceptanceGate` |
| `--refine` flag | CLI flag (no value) | Triggers the refinement pass |

| Output | Format | Guarantees |
|--------|--------|------------|
| `refinedWorkflow` | Same format as input (JSON ↔ YAML) | • Every `description` is ≤ 120 chars, starts with an imperative verb, and references only inputs/outputs that exist in the same step.<br>• Every `acceptanceGate` is a single boolean expression using the step’s output variables, fully parenthesized, and testable by the 
...[truncated]

Must

Keep the deterministic unrefined path as the default.
Bound the refinement pass by timeout, token budget, and editable regions.
Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

Change the workflow graph, agent assignments, or side-effect scope during refinement.
Fail the whole generation if optional refinement times out.
Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `feature/ricky-version-spec`  

---

## 1. Scope & Acceptance Criteria  

| ID | Requirement | Success Metric |
|----|-------------|----------------|
| V‑001 | The CLI must output a semantic version string (`MAJOR.MINOR.PATCH`) when invoked as `ricky --version`. | Exact match to `package.json`/`pyproject.toml` version field. |
| V‑002 | The version flag must be case‑insensitive (`--VERSION`, `-v`). | All aliases return identical output. |
| V‑003 | The command must exit with status `0`. | `echo $?` equals `0`. |
| V‑004 | No side‑effects (no file writes, no network calls). | Process tree shows no I/O beyond stdout. |
| V‑005 | Output must be prefixed with the product name: `ricky vX.Y.Z`. | Regex `^ricky v\d+\.\d+\.\d+$`. |
| V‑006 | Documentation must include usage example and version source. | Docs build passes and contains the example. |
| V‑007 | CI must enforce 100 % test coverage for the version module. | `coverage report` shows `100%` for `src/version.*`. |

---

## 2. Contract (YAML)  

```yaml
# .github/workflows/ricky-version-spec.yml
name
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through validation, automatic fixing, and conditional rerun until all artifacts pass or are marked as blocked.  

---  

### 1. Scope & Boundaries
| Item | Description |
|------|-------------|
| **Branch** | `feature/batch‑artifact‑loop` (isolated from `main`) |
| **PR** | `PR #<auto‑generated>` – contains workflow YAML, test fixtures, and fix scripts. |
| **Artifacts** | Any files under `artifacts/` (glob `artifacts/**`). Each artifact is self‑contained; no cross‑artifact dependencies. |
| **Exit Criteria** | All artifacts are **PASS** or **BLOCKED**. No “UNKNOWN” state remains. |
| **Owner** | `@team‑mlops` (reviewers: `@ml‑engineer`, `@data‑ops`) |

---  

### 2. Contract (Inputs / Outputs)

| Contract | Type | Description |
|----------|------|-------------|
| `artifact_list` | **Array[String]** | Relative paths of artifacts to process (generated by `discover_artifacts` step). |
| `validation_report` | **JSON** | `{ "artifact": "<path>", "status": "PASS|FAIL|ERROR", "details": "<msg>" }` per artifact. |
| `fix_repo
...[truncated]

Must

Choose dag, supervisor, or pipeline deliberately based on the work shape.
Explain the pattern choice in artifact metadata or a rationale.
Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

Default blindly to dag for every workflow.
Collapse independent artifact work into one vague agent task.
Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|-------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step must declare its **input schema**, **output schema**, and **exit codes** in a machine‑readable `contract.yaml`. | The workflow begins with a `contract` block for the top‑level trigger and for each sub‑task. This makes downstream validation deterministic and enables the **Verification Gate** to auto‑check type‑compatibility before any code runs. |
| **Deterministic Verification Gates** | Insert a **gate** after each logical group of steps that runs a static‑analysis test (e.g., schema match, idempotency check). | After the *Data‑Ingestion* and *Model‑Scoring* groups, a `gate: verify_schema` node is emitted. If the gate fails, the workflow halts and reports a **Blocker** with a link to the offending contract diff. |
| **Review Artifacts Generation** | Auto‑create a **review‑artifact** (markdown diff, test‑run summary, and a “what‑changed” table) for every PR. | The generated workflow includes a `post‑proc
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a short “clarifying‑request” template you can copy‑paste back to the user. It follows our standard workflow‑authoring contract: we first gather the missing details, then we’ll produce a deterministic, test‑driven workflow with explicit verification gates and review artifacts.

---

**Subject:** Clarification needed – Product spec for workflow generation  

**Body:**

Hi [User],

Thanks for the request to generate a workflow. To deliver a reliable, testable workflow that meets our **Ricky** standards, I need a few concrete details from the product spec:

| # | Required Detail | Why it matters |
|---|----------------|----------------|
| 1 | **Primary business goal** (e.g., “process new‑customer onboarding”, “run nightly data sync”) | Determines the overall workflow shape and success criteria. |
| 2 | **Key actors / services** (e.g., API gateway, DB, external SaaS) | Defines the nodes and integration points we must orchestrate. |
| 3 | **Trigger event** (e.g., webhook, schedule, manual UI) | Sets the entry gate for the workflow. |
| 4 | **Core steps** (ordered list, brief description) | Provides the deterministic sequence we’ll encode. |
| 5 | **Success & failure conditions**
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ Detect & Log | • Pull the execution trace from the orchestration engine (e.g., Temporal, Airflow). <br>• Identify the timed‑out step (Step A) and the “zombie” worker (Worker B) that stayed alive past its heartbeat window. | Provides a deterministic, reproducible record of the failure point. | `failure_report.md` – includes step IDs, timestamps, timeout values, worker heartbeat logs. |
| 2️⃣ Blocker Flag | • Mark the workflow run as **BLOCKED** in the CI/CD dashboard. <br>• Emit a **Ricky‑Blocker** event with severity = `high` and a short‑code `WF_TIMEOUT_ZOMBIE`. | Prevents downstream pipelines from consuming incomplete artifacts and surfaces the issue to the on‑call team. | Blocker ticket in the issue tracker (e.g., GitHub Issue #<id>) automatically linked to the run ID. |
| 3️⃣ Isolate & Quarantine | • Terminate the stray Worker B process via the worker‑manager API. <br>• Spin up a fresh, clean worker pool for the next retry. | Guarantees that no hidden state leaks into the next attempt. | `worker_quarantine.log` – contains termination com
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Guard for Local AgentWorkforce Runs**  

| **Element** | **Definition** |
|-------------|----------------|
| **Scope** | Local execution of any `.agent/`‑based workflow on a developer workstation. |
| **Inputs** | 1. Working directory root (absolute path).<br>2. Optional `--allow‑stale` flag (bool). |
| **Outputs** | - `guard‑status.json` (machine‑readable pass/fail + diagnostics).<br>- Human‑readable console report. |
| **Success Criteria** | All three state directories (`.agent‑relay/`, `.relay/`, `.trajectories/`) are either empty **or** contain only files whose timestamps are ≤ 5 min older than the current run start time. |
| **Failure Criteria** | Any file in the three directories is newer than the 5‑min window **and** `--allow‑stale` is not set. |
| **Deterministic Verification Gate** | A pure‑function `verifyStaleState(rootPath, now)` that returns a boolean and a diagnostics array. No side‑effects. |
| **Idempotence** | The guard can be re‑run without altering the filesystem (except for the optional `--clean‑stale` mode, which is a separate sub‑command). |
| **Rollback** | On failure the workflow aborts before any agent containers are starte
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky/Relay run is already active in this workspace”**

The marker file is a safety lock that prevents two concurrent executions from stepping on each other. When the lock is left behind (e.g., a crash, a killed terminal, or a stray background process) the next `ricky` or `relay` invocation aborts with the message you’re seeing.

Below is the deterministic, repeatable repair workflow that follows the **Ricky repository conventions** (AGENTS.md, workflow standards, and shared authoring rules). Follow the steps in order; each step has a verification gate so you can be sure the problem is truly cleared before moving on.

---

## 1️⃣ Identify the workspace lock location

| Artifact | Path (relative to repo root) |
|----------|------------------------------|
| Run‑marker file | `.ricky/run_marker` |
| Optional PID file (if created) | `.ricky/run_pid` |

> **Verification Gate 1** – Confirm the files exist:  
> ```bash
> ls -l .ricky/run_marker .ricky/run_pid 2>/dev/null
> ```

If the files are missing, the error is coming from a stale lock in a parent directory—proceed to **Step 2‑B**.

---

## 2️⃣ Determine the owning process (if any)

1. **Read the marker** – it usual
...[truncated]

Must

Report the active marker, run id, or status path when available.
Ask the user to inspect, wait for, or explicitly clear the active run.
Avoid launching a competing run that could corrupt evidence.

Must Not

Silently start another run.
Hide the existing run marker from the user.
Treat the conflict as a generic failure with no recovery path.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix injects the **Ricky environment loader** (`loadRickyWorkflowEnv`) and the **optional guard** (`assertRickyWorkflowEnv`) **once**, at the top of the generated workflow file, and then re‑runs the workflow. The patch works for both plain‑JS steps and for steps that embed a `node --input-type=module` HEREDOC with inline `import` statements (e.g. `import { readFile } from 'node:fs'`).

---

## 1. Contract – What the patch guarantees

| ✅ | Guarantee |
|---|---|
| **Deterministic** | The same source → same patched artifact. |
| **Idempotent** | Re‑applying the patch does not duplicate loaders or guards. |
| **Scope‑limited** | Only the top‑level of the workflow file is modified; step bodies remain untouched. |
| **Verification gate** | After patch, a **pre‑run validation** (`assertRickyWorkflowEnv`) fails fast if any required env var is still missing. |
| **Rollback‑safe** | The original file is saved as `<filename>.bak.<timestamp>` before modification. |
| **Review artifact**
...[truncated]

Must

Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Prerequisites (verify once)

| Check | Command | Expected outcome |
|-------|---------|-------------------|
| `aw` CLI available | `aw --version` | `aw version x.y.z` |
| Working directory is the workflow repo root | `git rev-parse --show-toplevel` | Path to repo |
| Current branch is a short‑lived feature branch (e.g., `auto-fix/run‑<timestamp>`) | `git branch --show-current` | `auto-fix/run-...` |
| Docker (or the runtime required by the workflow) is running | `docker ps` | List of containers (or empty) |
| Environment variables for secrets are exported (e.g., `AWS_ACCESS_KEY_ID`) | `env | grep AWS` | Keys present |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Create a scoped branch for the auto‑fix run

```bash
# 2.1 Create a deterministic branch name
BRANCH="auto-fi
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑start troubleshooting for a failed `ricky run workflows/foo.ts --no-auto-fix`**

---

## 1. Capture the failure context
1. **Run with full diagnostics**  
   ```bash
   ricky run workflows/foo.ts --no-auto-fix --log-level=debug --output=./ricky-debug

--log-level=debug writes the complete execution trace.
--output creates a folder (ricky-debug) that contains:
run.log – raw console output.
state.json – snapshot of the workflow state at each gate.
artifacts/ – any intermediate files produced before the failure.

Locate the deterministic verification gate that stopped the run
Open run.log and search for lines that start with 🛑 VERIFICATION GATE. The line will show:
- Gate name (e.g., ValidateSchema, CheckBranchPolicy).
- Reason for failure (e.g., “schema mismatch”, “branch protection rule violated”).

2. Common failure categories & fixes

Category	Typical gate	What to look for	Fix approach
Schema / contract mismatch	`ValidateSchema`	`expected: X, got: Y` in `state.json`	Update `foo.ts` to conform to the curren
...[truncated]


**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky’s “Run a Local TypeScript Workflow” Playbook**  
*(primary local path = the repository root where the workflow source lives, e.g. `./workflows/`)*  

| Phase | Action | Command / Artifact | Success Gate | Failure Handling |
|-------|--------|--------------------|--------------|------------------|
| **1️⃣ Prep & Contract** | Load the workflow contract file (`workflow.json`) that declares the entry point, required npm packages, and expected I/O schema. | `cat workflow.json` | ‑ JSON parses, `entry` field exists, `inputs`/`outputs` schemas are valid JSON‑Schema. | If any check fails, abort and emit a **Blocker** issue: `Ricky‑Blocker: Invalid contract – <reason>`. |
| **2️⃣ Environment Bootstrap** | Ensure the local Node ≥ 18 runtime and `ts-node` are available. | `node -v`  <br> `npm list -g ts-node` | Node ≥ 18 **and** `ts-node` present. | If missing, run `npm install -g ts-node` (or report blocker if global install is disallowed). |
| **3️⃣ Dependency Install** | Install workflow‑specific deps declared in `package.json` located next to the entry file. | `npm ci` (run in the workflow directory) | Exit code 0 **and** `node_modules/` populated. | On non‑zero exit, create a **B
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Escalation Workflow for Persistent Structural Failures**

When a structural failure remains after an initial fix attempt, Ricky must transition from “repair” to “boundary‑escalation” mode. The contract below defines the deterministic steps, verification gates, and artefacts required to keep the workflow reliable, auditable, and reproducible.

---

## 1. Trigger Condition
| Condition | Source |
|----------|--------|
| `failure_status == “persisted”` **AND** `fix_attempts >= 1` | `ricky/monitoring/structural_failure.yaml` |

*The monitor emits a `FAILURE_PERSISTED` event that starts the **Escalation** state machine.*

---

## 2. State‑Machine Overview

[Detect] → [Validate Fix] → (Success) → [Close]
|
└─(Failure) → [Escalate] → [Blocker Report] → [Branch/PR] → [Validation Loop] → (Resolved) → [Close]


*Only the **Escalate** branch is entered when the failure persists.*

---

## 3. Escalation Contract

| Step | Owner | Input | Output | Gate |
|------|-------|-------|--------|------|
| **E1 – Capture Context** | Ricky (automation) | `FAILURE_PERSISTED` event, logs, diff of attempted fix | `escalation_context.json` (i
...[truncated]

Must

Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
Distinguish escalation from a generic product failure.
Preserve enough context for a human operator to continue.

Must Not

Retry speculative fixes indefinitely.
Collapse to "something went wrong" without the attempted actions.
Discard evidence from failed repair attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary of Recent Runs**  
*(Generated 2026‑05‑27)*  

| Workflow | Run ID | Start | End | Duration | Status | Success % | Avg Latency (ms) | Error Types (top 3) | Blockers |
|----------|--------|-------|------|----------|--------|-----------|-------------------|----------------------|----------|
| **Data‑Ingest‑v2** | `r20260525-01` | 02:13 UTC | 02:14 UTC | 1 m 12 s | ✅ Success | 99.8 % | 84 | – | – |
|  | `r20260524-07` | 23:58 UTC | 00:01 UTC | 2 m 05 s | ⚠️ Partial | 96.3 % | 112 | `Timeout(3)`, `SchemaMismatch(2)` | Retry‑limit hit on step 3 |
| **User‑Profile‑Sync** | `r20260525-03` | 04:00 UTC | 04:02 UTC | 2 m 01 s | ✅ Success | 100 % | 57 | – | – |
|  | `r20260523-12` | 18:45 UTC | 18:48 UTC | 3 m 12 s | ❌ Failure | 71.4 % | 139 | `AuthFailure(5)`, `RateLimit(3)` | Missing API token |
| **Report‑Gen‑Daily** | `r20260525-08` | 06:00 UTC | 06:04 UTC | 4 m 02 s | ✅ Success | 98.7 % | 203 | – | – |
|  | `r20260524-09` | 06:00 UTC | 06:07 UTC | 7 m 15 s | ⚠️ Partial | 93.2 % | 221 | `DBLock(4)`, `Timeout(2)` | DB connection pool exhausted |
| **Alert‑Dispatch‑v1** | `r20260525-11` | 09:30 UTC | 09:31 UTC | 1 m 08 s | ✅ Success | 100 % | 45 | – 
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

| **Section** | **Requirement** | **Verification Gate** | **Pass‑Criteria** |
|-------------|----------------|-----------------------|-------------------|
| **1. Imports** | The workflow must import the **exact** primitive package name and version that the runtime‑precheck expects. | *Import‑Resolution Gate* – the build system resolves `@agent‑relay/github-primitive` to a published semver that satisfies the workflow’s `package.json` lock. | `import { createGitHubStep } from "@agent-relay/github-primitive"` resolves without warnings and the resolved version is ≥ the minimum version declared in `workflow.yaml` (e.g., `^2.3.0`). |
| **2. Step Construction** | The workflow must call `createGitHubStep` with a **deterministic** configuration object that includes: <br>• `action: "pull_request"` <br>• `title: string` (non‑empty) <br>• `base: string` (valid branch name) <br>• `head: string` (valid branch name) <br>• `body?: string` (optional) | *Step‑Schema Gate* – a JSON‑schema validator runs against the argument object. | The schema validates; all required fields are present, string‑typed, and match the regex `/^[A-Za-z0-9/_-]+$/` for branch names. |
| **3.
...[truncated]

Must

Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow directly from a channel or DM. The surface is built as a **modal‑driven UI** with deterministic contracts and verification gates so that every action is auditable, testable, and safe to run in production.

---

### 1. High‑Level Architecture

| Component | Responsibility | Tech Stack |
|-----------|----------------|------------|
| **Slack App (Ricky Debug)** | Slash command, interactive messages, modals; forwards user intents to the Ricky Engine via a thin API gateway. | Node 12+, Bolt for JavaScript, OAuth scopes `commands, chat:write, chat:write.public, triggers:read, triggers:write` |
| **Ricky Engine API** | Core workflow runtime (execution graph, state store, repair actions). Exposes deterministic endpoints (`GET /workflow/:id`, `POST /workflow/:id/step`, `PATCH /workflow/:id/repair`). | Go 1.22, gRPC + HTTP/JSON, OpenAPI v3 contract |
| **Verification Service** | Runs **validation loops** (80‑100 iterations) for any state‑changing request before it is committed. Returns a **gate token** that the engine must pre

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

cubic-dev-ai Bot reviewed May 27, 2026

View reviewed changes

khaliqgant merged commit 6a59a77 into main May 27, 2026
4 checks passed

khaliqgant deleted the fix/pr-side-effects-not-risky branch May 27, 2026 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(spec-intake): treat commit/push/open-PR as the normal outcome, not a risky side effect#135

fix(spec-intake): treat commit/push/open-PR as the normal outcome, not a risky side effect#135
khaliqgant merged 1 commit into
mainfrom
fix/pr-side-effects-not-risky

kjgbot commented May 27, 2026

Uh oh!

gemini-code-assist Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

2️⃣ Environment Bootstrap

3️⃣ Authentication & I

2. Common failure categories & fixes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kjgbot commented May 27, 2026

Problem

Fix

Verification

Uh oh!

gemini-code-assist Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

Ricky Eval Review

Human Review Cases

2️⃣ Environment Bootstrap

3️⃣ Authentication & I

2. Common failure categories & fixes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 27, 2026 •

edited

Loading