feat(cli): heartbeat the local progress spinner with elapsed time by kjgbot · Pull Request #108 · AgentWorkforce/ricky

kjgbot · 2026-05-15T13:40:02Z

Summary

Make the local writer phase visibly alive. Today the spinner sits on Authoring workflow with Workforce persona... for 5–25 minutes on large specs with no signal whether the writer is making progress, hung in skill install, blocked on the network, or close to the 20-min SIGTERM. The only diagnostic available is ps aux | grep claude --model and reading the writer subprocess's args by hand.

This PR adds a periodic refresh to createLocalProgressReporter. Every 5 seconds (configurable), the spinner text is re-rendered with an elapsed-time suffix derived from when the last onProgress message was set:

Before

⠋ Authoring workflow with Workforce persona...
  (sits silent for 20 minutes; you can't tell the difference between
   "writer is slowly streaming output" and "process is wedged")

After

⠋ Authoring workflow with Workforce persona... (5s)
⠙ Authoring workflow with Workforce persona... (10s)
…
⠼ Authoring workflow with Workforce persona... (4m 55s)
…
⠴ Authoring workflow with Workforce persona... (19m 50s)

Why now

Caught this debugging a multi-spec parallel run today: 4 ricky processes sat at 0% CPU for 24 minutes with empty per-spec logs. Looked completely hung. Turned out each had a healthy claude --model claude-sonnet-4-6 child at 3% CPU writing real output — but the only way to know that was a manual ps aux. A simple elapsed-time tick on the spinner would have eliminated the guesswork.

What changed

createLocalProgressReporter now tracks the last onProgress message + the time it was set, and starts a setInterval (default 5s) that re-renders the spinner text with formatElapsed(now - messageSetAt). The interval is .unref()'d so it never blocks process exit.
progressSpinnerText gains an optional elapsedMs arg and suppresses the suffix for elapsed < 1s (avoids "(0s)" flicker right after a new message lands).
New formatElapsed helper renders compact Ns for under a minute, Nm SSs past a minute.
New CliMainDeps.localProgressHeartbeatMs and CliMainDeps.now overrides for deterministic tests. Also honors RICKY_PROGRESS_HEARTBEAT_MS env var so users can tighten or disable the cadence without touching code.

Test plan

Two new tests in cli-main.test.ts (123 pass, was 121):

refreshes the spinner text with an elapsed-time suffix while the local writer phase is silent — injects a deterministic clock + 4ms heartbeat cadence, advances the clock 65s, and verifies the spinner text picks up both (Ns) and (Nm XXs) forms.
honors localProgressHeartbeatMs: 0 to disable the elapsed-time heartbeat entirely — confirms the existing behavior is preserved for callers that explicitly opt out (e.g. integrations that want strict, byte-for-byte spinner output).

All 121 pre-existing cli-main.test.ts tests still pass — the heartbeat is invisible to existing tests because their runners return synchronously before the default 5s tick fires.

Out of scope

Two natural follow-ups that I deliberately kept out of this PR to keep the diff small and the review surface focused:

Forwarding writer subprocess output to a .live log file. The writer already has an onProgress chunk callback ({ stream, text }) that is currently unused at the CLI layer; piping it to .workflow-artifacts/ricky-persona-debug/writer/<digest>-live/output.live.txt would let power users tail -f for real-time visibility into what the writer model is actually generating. Worth its own PR with care for log-size / rotation.
Phase-aware messages from inside the writer. "Authoring..." is currently a single static message for the entire writer phase. Emitting transitions like Resolving persona..., Installing skills..., Spawning writer (claude/claude-sonnet-4-6)..., Streaming output... would make the heartbeat even more useful. Needs hooks into the persona resolver + harness-kit which I didn't want to touch here.

🤖 Generated with Claude Code

The local writer phase (`Authoring workflow with Workforce persona...`) can sit for 5-25 minutes on large specs. Today the spinner shows a static message the entire time, so users have no signal whether the writer is making progress, hung in skill install, blocked on the network, or close to the 20-min SIGTERM. The only way to investigate is to `ps aux | grep claude` and inspect the writer subprocess by hand. This change adds a periodic refresh to `createLocalProgressReporter`: every 5 seconds (configurable via `--`-flagless `RICKY_PROGRESS_HEARTBEAT_MS` env var or the `localProgressHeartbeatMs` dep), the spinner text is re-rendered with an elapsed-time suffix derived from when the last `onProgress` message was set. Output goes from: ⠋ Authoring workflow with Workforce persona... (sits silent for 20 minutes) to: ⠋ Authoring workflow with Workforce persona... (5s) ⠙ Authoring workflow with Workforce persona... (10s) … ⠼ Authoring workflow with Workforce persona... (4m 55s) … The heartbeat is opt-out (`localProgressHeartbeatMs: 0`) for tests and strict-output integrations; the default 5s is unintrusive but visible enough to confirm the process is alive between meaningful phase transitions. The setInterval handle is `.unref()`'d so it never blocks the process from exiting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-15T13:40:13Z

📝 Walkthrough

Walkthrough

The PR adds a configurable heartbeat mechanism to the CLI spinner that periodically refreshes with elapsed-time suffixes. It extends CliMainDeps with heartbeat interval and clock injection, implements periodic spinner text updates in createLocalProgressReporter, provides helpers for interval resolution and time formatting, and includes tests verifying both the enabled and disabled heartbeat behavior.

Changes

CLI Spinner Heartbeat Feature

Layer / File(s)	Summary
CliMainDeps Interface `src/surfaces/cli/commands/cli-main.ts`	`CliMainDeps` interface adds optional `localProgressHeartbeatMs` (heartbeat cadence control) and `now()` (deterministic clock for testing) properties.
Local Progress Spinner Heartbeat Implementation `src/surfaces/cli/commands/cli-main.ts`	Spinner state management extended to track latest message and timestamp, start/stop a `setInterval` heartbeat that refreshes spinner text with elapsed-time suffix, and clear heartbeat and state on `stop()`.
Heartbeat and Elapsed-Time Helpers `src/surfaces/cli/commands/cli-main.ts`	`resolveHeartbeatIntervalMs` computes interval from deps or environment with 0-disable semantics; `progressSpinnerText` accepts optional `elapsedMs`; `formatElapsed` renders seconds/minutes suffix as `(Ns)` or `(Nm XXs)`.
Heartbeat Tests `src/surfaces/cli/commands/cli-main.test.ts`	Two test cases: one verifies spinner text refreshes with elapsed-time suffixes in correct format using deterministic clock and short delay (including minutes form); another confirms disabled heartbeat (`localProgressHeartbeatMs: 0`) prevents elapsed-time suffix.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 A heartbeat for the spinner spins,
Elapsed time now shows how long it begins,
With (Ns) and (Nm XXs) to display,
The CLI's progress brightens the day!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately summarizes the main change: adding a heartbeat feature to the local progress spinner that displays elapsed time, which is the core functionality introduced in this changeset.
Description check	✅ Passed	The PR description is detailed and directly related to the changeset, explaining the motivation, implementation details, test coverage, and deliberate out-of-scope items for the spinner heartbeat feature.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/cli-writer-heartbeat

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/surfaces/cli/commands/cli-main.test.ts (1)

2053-2054: ⚡ Quick win

Stabilize these heartbeat tests by removing real-time sleeps.

These checks depend on short real-time delays and may intermittently miss interval ticks under CI load. Prefer fake timers (vi.useFakeTimers + vi.advanceTimersByTimeAsync) so elapsed-heartbeat assertions are deterministic.

Proposed deterministic timer pattern

+    vi.useFakeTimers();
+    try {
-      await cliMain({
+      const run = cliMain({
         argv: ['run', 'workflows/generated/issue-3.ts', '--foreground'],
         output: ttyOutputSink(),
         isTTY: true,
         createProgressSpinner,
         runInteractive: runner,
         localProgressHeartbeatMs: 4,
         now,
       });
+      await vi.advanceTimersByTimeAsync(30);
+      await run;
+    } finally {
+      vi.useRealTimers();
+    }

Also applies to: 2108-2109

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/surfaces/cli/commands/cli-main.test.ts` around lines 2053 - 2054, Replace
the real-time sleep-based waits in the heartbeat tests (the setTimeout(...)
before returning fakeInteractiveResult({ ok: true, localResult:
stagedLocalResult() }) and the similar sleep at lines 2108-2109) with
deterministic fake timers: call vi.useFakeTimers() at the start of the test,
replace the await new Promise(setTimeout...) with await
vi.advanceTimersByTimeAsync(...) for the same elapsed interval, and restore
timers after the test; ensure you advance the timers enough to trigger the
heartbeat interval being asserted in the test so assertions on elapsed-heartbeat
become deterministic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/surfaces/cli/commands/cli-main.test.ts`:
- Around line 2053-2054: Replace the real-time sleep-based waits in the
heartbeat tests (the setTimeout(...) before returning fakeInteractiveResult({
ok: true, localResult: stagedLocalResult() }) and the similar sleep at lines
2108-2109) with deterministic fake timers: call vi.useFakeTimers() at the start
of the test, replace the await new Promise(setTimeout...) with await
vi.advanceTimersByTimeAsync(...) for the same elapsed interval, and restore
timers after the test; ensure you advance the timers enough to trigger the
heartbeat interval being asserted in the test so assertions on elapsed-heartbeat
become deterministic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 115ef99c-d89b-4dd3-bead-9dc627fafada

📥 Commits

Reviewing files that changed from the base of the PR and between 0816b6e and b3943c3.

📒 Files selected for processing (2)

src/surfaces/cli/commands/cli-main.test.ts
src/surfaces/cli/commands/cli-main.ts

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

cubic-dev-ai

No issues found across 2 files

github-actions · 2026-05-15T14:29:30Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-15T13-41-13-148Z-HEAD-provider
Mode: provider
Git SHA: bcead0e

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope — Update Ricky documentation and code to reflect a deeper integration with the *Agent Assistant* component.**  

Below is a concrete, review‑ready workflow contract that satisfies the Ricky repository conventions (AGENTS.md, workflow standards, shared authoring rules, product specs).  

---  

## 1. Branch & PR plan
| Step | Branch name | PR title | Owner | Review gate |
|------|-------------|----------|-------|-------------|
| 1 | `feat/agent‑assistant‑deep‑integration` | **Ricky: expose deeper Agent Assistant usage in docs & code** | Ricky (you) | CI + Design‑review |
| 2 | `fix/docs‑aa‑integration` (child of above) | **Docs: update README, ARCHITECTURE, and usage guide** | Ricky | Docs‑lint, Spell‑check |
| 3 | `refactor/code‑aa‑integration` (child of above) | **Code: surface Agent Assistant calls in `ricky/core.py` & `ricky/cli.py`** | Ricky | Unit‑test, Type‑check, Integration‑test |

All three PRs are opened as a **stacked PR** (GitHub “dependent pull requests” or `gh pr stack`). The top‑most PR (`feat/agent‑assistant‑deep‑integration`) aggregates the two child PRs and is the final merge target to `main`.

---

## 2. Documentation changes (PR #2)

| File | Section | C
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(Compiled 15 May 2026 – based on the last 90 days of telemetry, PR history, and stakeholder feedback)*  

| Metric | Current Value | Target / Trend | Comments |
|--------|---------------|----------------|----------|
| **Enabled in Production** | 68 % of active services | ≥ 90 % (Q4 2026) | 32 % of services still run with the legacy “single‑turn” model. |
| **Turn‑context latency overhead** | +3.2 ms per request (average) | ≤ 5 ms | Within acceptable range; no noticeable impact on SLA. |
| **Error‑rate reduction** | 27 % fewer “stale‑state” failures (from 4.8 % → 3.5 %) | ≥ 30 % reduction | Near target; most gains realized in the order‑processing pipeline. |
| **Developer adoption** | 42 % of repos have `@agent-assistant/turn-context` listed in `package.json` | ≥ 75 % | Adoption lag due to missing migration guide in some teams. |
| **Automated test coverage** | 71 % of turn‑context‑enabled services have ≥ 80 % unit‑test coverage of the context API | ≥ 85 % | Gaps in legacy micro‑services that lack CI pipelines. |
| **Documentation compliance** | 58 % of services have up‑to‑date `README` sections describing the context contra
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move all four capabilities into the Agent Assistant**

| Capability | Why it belongs in the Agent Assistant | Impact on reliability & coordination |
|------------|----------------------------------------|--------------------------------------|
| **Workflow generation** | The Assistant already owns the *authoring contract* (input spec → workflow DSL). Centralising generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for DSL syntax, versioning, and default scaffolding. |
| **Workflow validation** | Validation is the deterministic gate that follows generation. Embedding it in the same service lets the Assistant run *pre‑merge* checks (schema, type‑checking, contract compliance) before any PR is opened. | Reduces false‑positive failures downstream, enforces the 80‑100 validation‑loop policy, and produces machine‑readable evidence artifacts. |
| **Debugging (staged CLI UX)** | Debugging is the *post‑generation* feedback loop. By exposing a staged CLI (e.g., `assistant workflow debug --stage <n>`), the Assistant can replay each gate, surface exact failure points, and suggest corrective actions. | Improves mean‑time‑to‑repair (
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (May 15 – Aug 7 2024)*  

---

## 1. Slice Goal
Deploy the next‑generation reliability & coordination features of Ricky to **30 % of active AgentWorkforce teams** (≈ 45 teams) while establishing a repeatable, gated workflow for future slices.

---

## 2. Scope & Boundaries  

| Area | In‑Scope | Out‑Of‑Scope |
|------|----------|--------------|
| **Core runtime** | New deterministic contract engine, “gate‑check” API, and enhanced logging hooks. | Full UI redesign (deferred to v3). |
| **CI/CD** | Branch `release/ricky‑v2.0`, protected PR, required status checks, automated contract‑test suite. | Migration of legacy pipelines (handled in v3). |
| **Documentation** | Contract spec (Markdown), onboarding checklist, “adoption playbook” for team leads. | Full API reference (already covered in v1). |
| **Observability** | Export of contract‑validation metrics to Grafana dashboards. | Real‑time alerting (planned for v3). |
| **Support** | Dedicated “Adoption Slack channel” + weekly office‑hours. | 24/7 on‑call rotation (still v1). |

All changes live under `src/ricky/` and `docs/ricky/`. No cross‑repo mod
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky‑Assist : Future Slack / Web Support Package  
*Agent Assistant (AA) v2.3 – “Ricky‑Assist”*  

| Component | Purpose | Contract (AA‑Spec) | Verification Gate |
|-----------|---------|--------------------|-------------------|
| **AA‑Slack‑Bot** | Real‑time Slack interaction for workflow creation, status checks, and blocker reporting. | `SlackBot(v:2.3) → {msg: string, action: enum[CREATE, STATUS, BLOCK, REVIEW]}`<br>All inbound messages must match `SlackMsgSchema` (JSON: `{user:string, channel:string, text:string, ts:string}`) and be signed with the workspace app token. | Unit test `SlackMsgValidator.test.ts` (100 % pass) → Integration test against a sandbox workspace (pass ≥ 95 % of 10k simulated messages). |
| **AA‑Web‑UI** | Browser‑based console for the same capabilities, plus visual workflow graph editor. | `WebUI(v:2.3) → {view: enum[GRAPH, LIST, SETTINGS], payload: JSON}`<br>All UI actions emit `WebAction` events that conform to `WebActionSchema`. | End‑to‑end Cypress suite (≥ 80 % coverage) → Accessibility audit (WCAG 2.1 AA). |
| **AA‑Core‑Engine** (shared) | Stateless engine that parses user intents, validates contracts, and routes to the appropriate service (Slack
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"

--mode local – tells Ricky to operate against the local repository (no remote CI/CD integration).
--spec "generate a workflow for package checks" – is the natural‑language request that Ricky will translate into a concrete workflow definition (YAML/JSON) following the AgentWorkforce standards.
No --run flag – means Ricky will only generate the workflow artifact; it will not submit, schedule, or execute any checks.

So the command will:

Parse the spec string.
Resolve the appropriate workflow contract (inputs, outputs, validation gates, review steps).
Emit a workflow file (e.g., .ricky/workflows/package-checks.yaml) in the current repo.
Print a short summary to stdout and exit with status 0 if generation succeeded, or a non‑zero code with an error message if it could not resolve a valid contract.

Typical output

✅ Workflow generated: .ricky/workflows/package-checks.yaml
   • Contract: pkg-checks-v1.2
   • Deterministic verification gates: lint, unit-test, security-scan
   • Review artifact: p
...[truncated]

Must

Say generation is the default and execution was not requested.
Print the generated artifact path, workflow id, spec digest, and next run command.
Avoid showing execution evidence for a generation-only request.

Must Not

Imply the workflow ran automatically.
Present a generation-only result as execution success.
Hide the opt-in commands for running the artifact.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.first-run-copy-is-compact-and-truthful (cli-behavior/manual)

User message

Render Ricky's first-run CLI onboarding for a new user.

Ricky output

**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce assistant. It is expressed as a deterministic Bash‑compatible script together with the required verification gates, review artifacts, and a minimal “validation‑loop” checklist (80‑100 % coverage) that you can copy‑paste into a fresh repository or run locally.

---

## 1️⃣  Workflow Contract (README‑style)

| Item | Description |
|------|-------------|
| **Name** | `ricky-onboard` |
| **Purpose** | Bootstrap a user’s environment, install dependencies, configure the Ricky CLI, and verify a healthy first‑run. |
| **Inputs** | None (runs in a fresh shell). |
| **Outputs** | `~/.ricky/config.yaml` (populated), `ricky version` prints the installed version, and a `onboard_success.log` file. |
| **Success Criteria** | All verification gates return exit‑code 0 and the log contains “ONBOARDING COMPLETED”. |
| **Failure Mode** | Any gate failure aborts the script, prints a clear blocker message, and writes `onboard_error.log`. |
| **Scope** | All changes are confined to the user’s home directory (`~/.ricky/*`) and the local `ricky` binary in `~/.lo
...[truncated]

Must

Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
End every branch with a concrete next step.
Advertise only commands that are currently implemented.

Must Not

Sound like a launch page or documentation dump.
Claim Ricky runs workflows by default when generation is the default path.
Require web or Slack onboarding before CLI use.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Blocker:** The specification file is empty or missing, so I cannot generate or repair a workflow without concrete requirements.

**Next steps to unblock:**

1. **Provide the spec** – Upload the intended YAML/JSON/Markdown spec file or paste its contents here.
2. **If the spec is still being drafted**, give a brief description of:
   - The workflow’s purpose (e.g., data ingestion, model training, CI/CD pipeline).
   - Key stages or tasks you expect (e.g., “fetch data → transform → store → notify”).
   - Any mandatory constraints (e.g., runtime limits, required approvals, external service credentials).

Once I have that information, I can:

- Draft a concrete workflow contract (inputs, outputs, triggers).
- Insert deterministic verification gates (unit test, schema validation, smoke‑run).
- Generate the required review artifacts (design doc, test plan, PR checklist).
- Scope the branch/PR boundaries for incremental development.

Please share the missing spec or the minimal description above so we can proceed.

Must

Return a user-facing failure or guidance message with a real recovery command.
Distinguish generation failure from execution failure.
Show stack traces only when verbose diagnostic mode is requested.

Must Not

Crash with an uncaught exception in normal mode.
Suggest commands that do not exist.
Pretend a missing spec was accepted.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google: NOT VERIFIED (no successful health‑check)
• GitHub: NOT VERIFIED (no successful health‑check)

🔧 Configuration
• Google OAuth client ID:
• GitHub App ID:
• Required scopes: configured

⚙️ Runtime
• Last heartbeat: 2026‑05‑15 09:12:03 UTC
• Active workers: 4
• Queue depth: 0

🚨 Blockers
• Google provider not confirmed – actions that depend on Google APIs (e.g., Drive, Calendar) are disabled.
• GitHub provider not confirmed – actions that depend on GitHub APIs (e.g., repo cloning, PR automation) are disabled.

📝 Next Steps

Run provider health checks:
ricky provider check --name google
ricky provider check --name github
Review provider logs for authentication errors.
Re‑authenticate if tokens are expired or scopes are missing.
Re‑run ricky status after successful checks to see the green ✅ state.

✅ Overall health: Degraded – core workflow engine is running, bu
...[truncated]


**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
## GitHub Primitive‑Change Workflow  
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible review, CI, and merge safety.

### 1. Scope & Branching Contract
| Item | Definition |
|------|------------|
| **Change Type** | Primitive (single‑file, non‑breaking) |
| **Target Repo** | `owner/repo` (public) |
| **Base Branch** | `main` |
| **Feature Branch** | `fix/<short‑desc>-<YYYYMMDD>` |
| **PR Title** | `fix: <short‑desc>` |
| **PR Description** | • Why the change is needed  <br>• Exact line(s) changed  <br>• Verification steps (see §4) |
| **Merge Strategy** | **Squash‑merge** with “Require linear history” enabled |
| **Protected Rules** | • Status checks required: `ci/lint`, `ci/test`  <br>• Code‑owner review required (if applicable)  <br>• No force‑pushes after PR opened |

### 2. Deterministic Steps (Ricky‑selected Skills)

| Step | Action | Skill(s) Employed |
|------|--------|-------------------|
| **2.1** | **Create feature branch** from `main` | *GitOps* – deterministic branch naming, base‑branch lock |
| **2.2** | **Edit file** locally (or via GitHub UI) | *FileEdit* – atomic edit, line‑preserving diff |
| **2.3
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: `review‑and‑implement‑claude‑codex`**

| **Section** | **Content** |
|-------------|-------------|
| **Purpose** | Automate a two‑stage development cycle where Claude performs a code‑review of a design spec and Codex generates the implementation. |
| **Branching Model** | <ul><li>`main` – stable release branch.</li><li>`feature/<ticket‑id>` – short‑lived branch for each spec.</li><li>`review/<ticket‑id>` – branch created after Claude review, contains only the reviewed spec.</li></ul> |
| **PR Strategy** | <ul><li>**PR‑1**: `feature/<ticket‑id>` → `review/<ticket‑id>` (Claude review).</li><li>**PR‑2**: `review/<ticket‑id>` → `main` (Codex implementation + final integration).</li></ul> |
| **Artifacts** | <ul><li>`spec.md` – original design spec.</li><li>`spec.reviewed.md` – Claude‑annotated spec.</li><li>`implementation/` – generated source files.</li><li>`review-report.json` – deterministic verification results.</li></ul> |
| **Contracts** | <ul><li>**SpecContract** (input):<br>`{ "title": string, "description": string, "acceptance_criteria": array<string> }`</li><li>**ReviewContract** (output of Claude):<br>`{ "spec_hash": string, "issues": array<{ "line": int, "type":
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **new version** where every step’s `taskDescription` and `acceptanceGate` are tightened to meet the **Ricky contract**:

| Element | Required fields | Typical refinements |
|---------|----------------|---------------------|
| `taskDescription` | `summary` (≤ 80 chars), `details` (≤ 300 chars) | • Remove vague verbs (“do”, “handle”) <br>• Add explicit input‑output mapping <br>• Reference concrete artifacts (e.g., `artifactId`, `schemaVersion`) |
| `acceptanceGate` | `type` (`manual|auto|hybrid`), `criteria` (array) | • Convert “looks correct” → concrete predicate (`output.recordCount == expectedCount`) <br>• Add deterministic timeout & retry policy <br>• Include verification script reference (`scriptPath`) |

### 1. Prepare the Base Workflow  

```json
{
  "name": "CustomerOnboard",
  "steps": [
    {
      "id": "fetch‑profile",
      "taskDescription": {
        "summary": "Get profile",
        "details": "Call the profile service."
      },
      "acceptanceGate": {
        "type": "auto",
     
...[truncated]

Must

Keep the deterministic unrefined path as the default.
Bound the refinement pass by timeout, token budget, and editable regions.
Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

Change the workflow graph, agent assignments, or side-effect scope during refinement.
Fail the whole generation if optional refinement times out.
Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Release**  
*Repository: `github.com/agentworkforce/ricky`*  
*Target branch: `main` → `release/vX.Y.Z`*  

---  

## 1. Scope & Objective
Produce a deterministic, test‑driven workflow that defines, validates, and releases the `ricky --version` command. The command must:

| Requirement | Acceptance Criteria |
|-------------|----------------------|
| **Semantic version output** | Prints `ricky version X.Y.Z` where `X.Y.Z` matches the `package.json` (or `go.mod`) version field. |
| **Git tag correlation** | The printed version must equal the most recent annotated tag `vX.Y.Z` reachable from `HEAD`. |
| **CI‑visible** | The version string is exposed as a CI artifact (`VERSION.txt`). |
| **Cross‑platform** | Works on Linux, macOS, and Windows runners. |
| **No side‑effects** | Running `ricky --version` does not modify the filesystem or environment. |

---  

## 2. Contract (YAML)

```yaml
name: ricky-version-spec
on:
  push:
    branches: [main]
    tags: ['v*']
  pull_request:
    branches: [main]

env:
  GO111MODULE: "on"
  VERSION_FILE: "VERSION.txt"

jobs:
  # -----------------------------------------------------------------
  # 1️⃣ Bui
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large collection of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates all results.

---  

## 1. Contract (Inputs / Outputs)

| Element | Type | Description |
|---------|------|-------------|
| `artifacts/` | directory | Root folder containing *N* independent artifact sub‑folders (`artifact‑<id>/`). |
| `artifact‑<id>/manifest.yaml` | YAML | Required per‑artifact metadata: `id`, `type`, `checksum`, `validation_rules` (list of rule IDs). |
| `validation_rules/` | directory | Library of rule definitions (see §2). |
| `fixers/` | directory | Scripts (`fix‑<rule>.sh`) that can remediate a specific rule failure. |
| **Outputs** | | |
| `reports/` | directory | Per‑artifact JSON report (`<id>.report.json`). |
| `summary.json` | JSON | Aggregated status: counts of PASS / FAIL / BLOCKED, timestamps, and a list of blocked artifact
...[truncated]

Must

Choose dag, supervisor, or pipeline deliberately based on the work shape.
Explain the pattern choice in artifact metadata or a rationale.
Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

Default blindly to dag for every workflow.
Collapse independent artifact work into one vague agent task.
Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a concise “skill‑impact map” that shows the direct effect each of the core Ricky authoring skills has on the structure, reliability, and reviewability of a generated workflow. The example workflow is a **“Customer‑Onboarding Email Sequence”** that moves a new sign‑up from “Welcome” → “Product Tour” → “First‑Use Tips” → “Feedback Request”.

| **Skill** | **What the Skill Enforces** | **Resulting Artifact in the Workflow** | **Why It Improves Reliability** |
|----------|----------------------------|----------------------------------------|---------------------------------|
| **Deterministic Contract Definition** | Every step declares an explicit **input‑output contract** (type, required fields, optional defaults). | `Step: SendWelcomeEmail` <br> `Input: {userId: string, email: email}` <br> `Output: {messageId: string}` | Guarantees downstream steps receive exactly what they expect; type‑checking catches mismatches at compile‑time. |
| **Verification Gates** | After each step a **gate** runs a deterministic test (unit‑style assertion) on the step’s output. | `Gate: VerifyWelcomeSent` <br> `assert out
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a **minimal, deterministic workflow skeleton** that you can flesh out once the missing product details are clarified.  
It follows the AgentWorkforce conventions (AGENTS.md, workflow standards, shared authoring rules) and includes:

* **Typed contract definitions** for inputs, outputs, and intermediate artifacts.  
* **Verification gates** that enforce deterministic checks at each stage.  
* **Review & validation loops** (80‑100 % coverage) with explicit blocker reporting.  
* **Branch/PR boundaries** to keep changes scoped and reviewable.

---

## 1️⃣ High‑Level Overview

| Stage | Owner | Description |
|-------|-------|-------------|
| **Spec Ingestion** | `spec‑ingest` | Pull the product spec (currently vague) from the source repo. |
| **Requirement Extraction** | `req‑extract` | Parse the spec into concrete functional & non‑functional requirements. |
| **Design Draft** | `design‑draft` | Generate a design document (architecture diagram, data model, API contracts). |
| **Contract Generation** | `contract‑gen` | Emit typed JSON‑Schema contracts for each service/component. |
| **Implementation Stub** | `impl‑stub` | Scaffold code repositories with CI pipelines based on co
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s next actions after a failed workflow**

1. **Capture the failure state**  
   - Log the exact step that timed‑out (including step ID, start/end timestamps, and timeout threshold).  
   - Record the worker that remained alive (worker ID, heartbeat timestamps, last‑known state, and any partial output directories).  
   - Persist the full execution trace (JSON/YAML) to the workflow‑run artifact store and tag it with `status:failed`.

2. **Mark a deterministic blocker**  
   - Create a **Blocker Ticket** in the issue tracker (e.g., GitHub Issues) with the label `workflow‑blocker`.  
   - Title: `Timeout & orphan worker in <workflow‑name>#<run‑id>`  
   - Body must contain:
     * Step that timed out + config (timeout value, retry policy).  
     * Orphan worker details + last heartbeat.  
     * Links to the run logs and the persisted execution trace.  
   - Set the ticket status to **Open** and assign it to the *Workflow Reliability* owner.

3. **Spin up a deterministic verification gate**  
   - Generate a **Gate Definition** (`gate.yaml`) that will be added to the next PR:
     ```yaml
     name: post‑failure‑recovery‑gate
     description: Verify that the timed‑out step e
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Ricky – Stale‑State Detection for Local Workflow Launches**  

Below is the concrete, contract‑driven workflow that Ricky executes **before** any local workflow is started. It guarantees that the three runtime‑state directories (`.agent‑relay/`, `.relay/`, `.trajectories/`) are fresh, or it aborts with a deterministic blocker report.

---

## 1. Pre‑launch Contract

| Item | Requirement |
|------|--------------|
| **Input** | Path to the workflow root (default: current working directory). |
| **Outputs** | - Cleaned state directories (empty or freshly created). <br> - `ricky‑state‑report.json` summarising actions taken. |
| **Success Criteria** | All three directories exist **and** contain **no files older than 5 seconds** (the “freshness window”). |
| **Failure Mode** | If any directory cannot be cleaned (e.g., permission error, open file handle), the launch is **blocked** and a `ricky‑blocker‑report.json` is emitted. |
| **Verification Gate** | Deterministic hash of the directory contents (`sha256`) must equal the hash of an empty directory (`e3b0c44298fc1c149afbf4c8996fb924…`). |

---

## 2. Deterministic Detection & Cleanup Steps

1. **Locate Directories**  
   ```bash
   REL
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Resolving “Another Ricky or Relay run is already active in this workspace”**

The marker is a lock file that prevents concurrent executions. Follow the deterministic repair contract below.

---

## 1️⃣ Verify the current lock state
```bash
# Show the lock file (if it exists)
cat .ricky/run.lock || echo "No lock file"
# Show the PID recorded in the lock (if any)
grep -Eo '[0-9]+' .ricky/run.lock || echo "No PID recorded"

If the file is missing, the error is stale – skip to Step 4.

2️⃣ Confirm the recorded process is still alive

PID=$(grep -Eo '[0-9]+' .ricky/run.lock)
if [ -n "$PID" ]; then
  ps -p $PID -o pid,cmd
else
  echo "Lock file exists but no PID recorded"
fi

If the PID is running → another legitimate Ricky/Relay job is still executing.
- Wait for it to finish, or
- If you know it is hung, proceed to Step 3.
If the PID is not running → the lock is stale. Continue to Step 4.

3️⃣ Safely terminate a hung run (only if you own the workspace)

# Kill the stale process
kill -9 $PID
# Verify termination
ps -p $PID && echo "Failed to kill" || echo "Killed"

Blocker note – If the process belongs to
...[truncated]


**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is parsed before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched** – even when they contain `node --input-type=module` HEREDOCs with literal `import { … } from 'node:fs'` or `from 'node:path'` strings.

---

## 1. Patch contract

| Artifact type | Insertion point | Code added |
|---------------|----------------|------------|
| Master‑rendered workflow (`*.ricky.ts` or compiled `.js`) | Immediately after the file’s shebang / top‑level imports (or at line 1 if none) | `loadRickyWorkflowEnv();` |
| Optional guard | After the loader (same block) | `assertRickyWorkflowEnv(['MISSING_ENV_VAR']);` |
| Scope | Only the file that directly references the missing
...[truncated]

Must

Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run‑ID are known.

---

## 1️⃣ Pre‑flight Checks (Verification Gate)

| Check | Command | Expected outcome |
|-------|---------|-------------------|
| CLI version | `aw --version` | `>=2.4.0` |
| Repo root | `git rev-parse --show-toplevel` | prints absolute path |
| Workflow definition present | `ls .aw/workflows/<workflow‑name>.yml` | file exists |
| Auto‑fix feature enabled in config | `awk '/auto_fix:/ {print $2}' .aw/config.yml` | `true` |
| Docker (or runtime) reachable | `docker ps` | lists containers (or appropriate runtime check) |

If any gate fails, abort and fix the environment before proceeding.

---

## 2️⃣ Input Parameters (Scope)

| Parameter | Source | Example |
|-----------|--------|---------|
| `WORKFLOW_NAME` | Known workflow | `data‑ingest‑pipeline` |
| `FAILED_STEP` | Provided by user | `transform‑
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Immediate troubleshooting checklist**

| Step | Command / Action | What to look for |
|------|------------------|------------------|
| 1️⃣  | `ricky run workflows/foo.ts --no-auto-fix --log-level=debug` | Full debug log in `ricky.log`. Identify the exact error line and stack trace. |
| 2️⃣  | Open the workflow file `workflows/foo.ts` and locate the **contract block** at the top (e.g., `export const contract = { … }`). | Verify that the contract matches the current code – especially input schema, required env vars, and output types. |
| 3️⃣  | Run the **deterministic verification gate** manually: <br>`ricky verify contracts workflows/foo.ts` | The gate will surface contract mismatches, missing imports, or type‑checking failures before execution. |
| 4️⃣  | Check the **validation loop count** in the workflow header (e.g., `validationLoops: 80`). | If the loop count is lower than the recommended 80‑100, the workflow may be exiting early. Adjust and re‑run. |
| 5️⃣  | Look for **blocked resources** in the log (e.g., “BLOCKER: missing secret XYZ”). | Blockers must be resolved before the workflow can proceed. See the “Blocker Reporting” section below. |
| 6️⃣  | If the failure is insid
...[truncated]

Must

Preserve one-attempt behavior when auto-fix is disabled.
Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
Make clear that the user chose manual inspection over repair/resume automation.

Must Not

Start a repair loop despite --no-auto-fix.
Suppress the diagnosis because no repair was attempted.
Present the failure as a completed repair attempt.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

| Phase | Action | Details | Expected Outcome |
|-------|--------|---------|-------------------|
| 1️⃣  | **Locate the primary workflow directory** | `cd $REPO_ROOT/workflows` (or the path defined in `workflow.config.json` under `primaryPath`). | Working directory points at the exact location where the `.ts` artifact lives. |
| 2️⃣  | **Validate the artifact contract** | Run the deterministic verification gate: <br>`npm run verify:contract -- --file src/<artifact>.ts` <br>*(the `verify:contract` script checks that the file exports a `Workflow` object that matches the JSON schema in `schemas/workflow.schema.json`.)* | Fails fast if the artifact does not conform to the required contract; otherwise proceeds. |
| 3️⃣  | **Install / refresh dependencies** | `npm ci` (ensures a reproducible lock‑file install). <br>If the workflow uses a local package, run `npm link <local‑pkg>` after the install. | All `node_modules` are present and version‑pinned. |
| 4️⃣  | **Compile‑time type check** | `npx tsc --noEmit src/<artifact>.ts` <br>*(uses the repo’s `tsconfig.json`.)* | Guarantees there are no TypeScript erro
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Escalation Workflow for Persistent Structural Failures**

When a structural failure remains after an initial fix attempt, Ricky follows a deterministic, contract‑driven escalation path. The process is bounded, auditable, and designed to surface the blocker to the appropriate owners without dead‑locking the pipeline.

---

## 1. Failure Detection & Initial Fix Attempt

| Step | Action | Artifact |
|------|--------|----------|
| 1.1 | Run the **Structural‑Integrity Gate** (unit‑test, schema‑validation, CI lint) on the affected branch. | `gate‑structural.yaml` (CI config) |
| 1.2 | If the gate fails, automatically generate a **Fix‑Attempt PR** (`fix‑struct‑<hash>`) with a minimal, isolated change (e.g., schema correction, dependency pin). | PR title: `fix: structural integrity – <short‑desc>` |
| 1.3 | Attach the **Verification Contract** (`struct‑verify.json`) to the PR. It enumerates required checks: <br>• `schema‑valid: true` <br>• `type‑check: pass` <br>• `integration‑smoke: pass` | `struct‑verify.json` |

---

## 2. Post‑Fix Validation Loop (80‑100 iterations)

Ricky runs a bounded validation loop to avoid infinite retries.

| Loop | Condition | Action |
|-----
...[truncated]

Must

Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
Distinguish escalation from a generic product failure.
Preserve enough context for a human operator to continue.

Must Not

Retry speculative fixes indefinitely.
Collapse to "something went wrong" without the attempted actions.
Discard evidence from failed repair attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary (Last 30 Days)**  

| Workflow | Total Runs | Success % | Avg Duration | Recent Failures* | Blockers (last 7 d) | Last Review |
|----------|------------|----------|--------------|------------------|----------------------|-------------|
| **Onboard‑New‑Hire** | 124 | 96 % | 3 m 12 s | 2 (run #112, #119) | None | 2026‑05‑10 (peer‑review) |
| **Quarterly‑Payroll‑Run** | 8 | 88 % | 27 m 4 s | 1 (run #7) | Payroll‑API rate‑limit (unresolved) | 2026‑04‑28 (owner sign‑off) |
| **Customer‑Feedback‑Ingest** | 342 | 99 % | 45 s | 1 (run #331) | None | 2026‑05‑13 (auto‑review) |
| **Data‑Lake‑Sync** | 57 | 94 % | 12 m 33 s | 3 (run #54‑#56) | S3 bucket policy drift (open) | 2026‑05‑02 (owner sign‑off) |
| **Incident‑Response‑Triage** | 219 | 100 % | 1 m 8 s | 0 | None | 2026‑05‑14 (peer‑review) |

\*Failures listed are the most recent run IDs that did not reach the **Success** gate.

---

## 1. Health‑Score Overview
- **Overall Success Rate:** 96.2 % (average across all workflows)  
- **Critical‑Blocker Count:** 2 (both in *Quarterly‑Payroll‑Run* and *Data‑Lake‑Sync*)  
- **Trend:** Success rates have been stable (+0.3 % vs. previous 30‑day window). No
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Contract*  

---

## 1. Overview
The **Workflow Debugger** Slack surface lets engineers and product owners inspect, step‑through, and repair Ricky‑managed workflows directly from a Slack channel or DM. It follows the Ricky repository conventions:

| Aspect | Decision |
|--------|----------|
| **Entry point** | `/ricky debug <workflow‑id>` (Slash command) |
| **Interaction model** | Modal‑based wizard + threaded message actions |
| **State persistence** | `ricky/debug-state` branch per session (short‑lived) |
| **Verification gates** | Pre‑run validation, step‑gate, post‑run audit |
| **Review artifacts** | Auto‑generated `debug‑<id>.md` summary, PR diff, test matrix |
| **Loop budget** | 80‑100 validation loops per session (configurable) |
| **Blocker policy** | Immediate “⚠️ BLOCKED” flag posted to thread with escalation link |

---

## 2. User Flow (Deterministic Contract)

| Step | Trigger | Action | Output | Validation |
|------|---------|--------|--------|------------|
| 1 | `/ricky debug <wf‑id>` | Bot acknowledges, creates a **debug session** (UUID) and a temporary branch `debug/<wf‑id>/<session‑id>` from 
...[truncated]

Must

Treat Slack as a co-equal product surface, not a wrapper around CLI.
Route domain work through the same normalization, executor, evidence, and specialist stages.
Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

Put workflow generation or diagnosis domain logic directly in the Slack handler.
Degrade Slack to a developer shortcut with weaker routing than CLI.
Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Field | Type | Description |
|-------|------|-------------|
| **name** | `string` | Fixed value: `"web_submit_spec"` |
| **payload** | `object` | The data the UI must send to the AgentWorkforce engine. |
| **payload.spec** | `object` | Full workflow specification (JSON‑Schema `WorkflowSpec`). |
| **payload.mode** | `enum` | `"draft"` | `"activate"` | `"test"` – determines how the engine treats the spec. |
| **payload.metadata** *(optional)* | `object` | Free‑form key/value pairs for audit/logging. |
| **response** | `object` | Engine reply (see “Response Contract” below). |

---

## 1. Contract Definition (JSON‑Schema)

```json
{
  "$id": "urn:awf:handoff:web_submit_spec",
  "type": "object",
  "required": ["name", "payload"],
  "properties": {
    "name": { "const": "web_submit_spec" },
    "payload": {
      "type": "object",
      "required": ["spec", "mode"],
      "properties": {
        "spec": { "$ref": "urn:awf:schema:WorkflowSpec" },
        "mode": { "enum": ["draft", "activate", "test"] },
        "metadata": {
          "type": "object",
          "additionalProperties": { "type": "string" }
        }
      },
      "additiona
...[truncated]

Must

Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
Keep auth/session validation in the web surface before normalization.
Ensure downstream code does not need to know the request came from web.

Must Not

Short-circuit directly from the web handler to executors.
Create a parallel domain model for web requests.
Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix for the Claude‑to‑Ricky Hand‑off**

| Claude Input | What Ricky Should Do |
|--------------|----------------------|
| **Complete, well‑formed

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

devin-ai-integration Bot reviewed May 15, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 15, 2026

View reviewed changes

kjgbot merged commit 18d7ee2 into main May 15, 2026
5 checks passed

kjgbot deleted the feat/cli-writer-heartbeat branch May 15, 2026 13:56

kjgbot mentioned this pull request May 15, 2026

fix(auto-fix): preserve workflow intent during persona-driven repair #109

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): heartbeat the local progress spinner with elapsed time#108

feat(cli): heartbeat the local progress spinner with elapsed time#108
kjgbot merged 1 commit into
mainfrom
feat/cli-writer-heartbeat

kjgbot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

2️⃣ Confirm the recorded process is still alive

3️⃣ Safely terminate a hung run (only if you own the workspace)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kjgbot commented May 15, 2026

Summary

Why now

What changed

Test plan

Out of scope

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Ricky Eval Review

Human Review Cases

2️⃣ Confirm the recorded process is still alive

3️⃣ Safely terminate a hung run (only if you own the workspace)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 15, 2026 •

edited

Loading