Skip to content

fix(local): recover persona writer output from disk when stdout truncates#112

Merged
kjgbot merged 2 commits into
mainfrom
fix/persona-writer-file-on-disk-fallback
May 15, 2026
Merged

fix(local): recover persona writer output from disk when stdout truncates#112
kjgbot merged 2 commits into
mainfrom
fix/persona-writer-file-on-disk-fallback

Conversation

@kjgbot

@kjgbot kjgbot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Stop losing 22+ minutes of writer work to LLM output truncation. When the writer subprocess's stdout is cut off mid-JSON (because the LLM hit its max_output_tokens cap before finishing the artifact-content string), the parser now falls back to reading the workflow file from disk — but only when the file's mtime proves it was actually written by THIS writer call, not a stale leftover.

The bug this fixes

Driving 13 per-track sub-specs through ricky --mode local --spec-file <spec>.md --run --workforce-persona produced this failure mode after the bypass-permissions fix (#110) made the writer reliable enough to actually emit real content:

// meta.json from a parse-error debug dump
{
  "reason": "parse-error",
  "selection": { "tier": "best-value", "harness": "claude", "model": "claude-sonnet-4-6" },
  "result": { "status": "completed", "exitCode": 0, "durationMs": 1331747, "stderr": "" }
}
# output.raw.txt (38KB), truncated mid-emit:
Now I have enough context to write the complete workflow. Let me produce the structured JSON response.

```json
{
  "artifact": {
    "path": "workflows/generated/ricky-spec-local-sandbox-runner-sidecar-implementation-parent-spec-specs-mcp-cloud-spa.ts",
    "language": "typescript",
                                            ← cut off here, 22 minutes of work lost

claude --print buffers a full response then emits it. For large workflow artifacts (JSON-escaping a 20KB TypeScript source blows out to 50KB+ of output tokens), the LLM's max_output_tokens cap kicks in and the JSON gets cut off mid-content field. There's no claude CLI flag for raising the cap.

But the writer almost always did succeed at writing the workflow to disk via the Write tool — that part runs through the tool harness, not stdout. The parser was just throwing away a valid on-disk artifact because it had no parseable stdout wrapper.

What changed

  • PersonaResponseParseOptions.writerInvokedAtMs — caller pins a Date.now()-style timestamp before spawning the writer subprocess.
  • PersonaResponseParseOptions.statFile — stat-by-path override for deterministic tests; defaults to fs.statSync (ENOENT → undefined).
  • recoverArtifactFromTruncatedOutput — new private helper. Fires only after every other parse path has rejected the stdout. Reads expectedPath from disk ONLY when (a) writerInvokedAtMs is set AND (b) the file's mtimeMs is strictly greater than that timestamp — proving the file is the current writer's output, not a stale artifact from a prior attempt.
  • Recovered content still goes through validateArtifactContent (the existing workflow(...) builder check). A junk file on disk cannot smuggle itself past the structural validation. If structural validation fails, the original truncation error is re-raised so the caller sees the real symptom.
  • writeWorkflowWithWorkforcePersona captures writerInvokedAtMs immediately before its sendMessage call and threads it into the parser. Other callers of parsePersonaWorkflowResponse that don't pass the timestamp keep their prior behavior — the fallback never fires for them.

Why the freshness check matters

A naive "if a file exists at expectedPath, use it" fallback would let a stale leftover from a crashed prior attempt impersonate the current writer's output. By gating on mtime > writerInvokedAtMs, the fallback proves the file was written during this sendMessage window. Plus the resulting metadata carries { recoveredFromDisk: true, reason: 'stdout-truncated-or-unparseable' } so downstream callers and the auto-fix repair contract (#109) can see exactly why this artifact survived.

Test plan

Five new tests in workforce-persona-writer.test.ts:

  1. Happy path: truncated stdout + fresh file on disk → parser recovers, returns the file's content with recoveredFromDisk: true metadata.
  2. Stale file guard: same truncated stdout, but statFile returns an mtime older than writerInvokedAtMs → parser refuses, re-raises the original truncation error.
  3. Opt-in only: caller omits writerInvokedAtMs → fallback never fires, prior behavior preserved.
  4. Missing file guard: statFile returns undefined (ENOENT) → fallback skipped, original error re-raised.
  5. Junk content guard: file exists with mtime > writerInvokedAtMs but content is export const broken = 1; (no workflow( call) → validateArtifactContent rejects, original error re-raised.

151 of 151 generation-suite tests pass (was 146 + 5 new). Pre-existing scheduled-agent.ts typecheck errors are unrelated.

Stacks with

🤖 Generated with Claude Code

…ates

claude --print buffers a full response then emits it. For large workflow
artifacts the LLM's max_output_tokens cap kicks in and the JSON gets cut
off mid-emit — observed in production as ~38KB of valid JSON ending at
`"language": "typescript",` with no closing braces and no
`artifact.content`. The parser then throws "Workforce persona response
must be structured JSON or include fenced TypeScript artifact and JSON
metadata blocks" and Ricky reports a parse-error, even though the writer
typically *did* succeed at writing the workflow source to disk via the
Write tool (reliable now that bypass-permissions ships through the
resolver — see fix/headless-claude-bypass-permissions).

This commit adds a freshness-gated disk fallback so the parser can
recover from a truncated stdout without trusting stale on-disk leftovers:

- `PersonaResponseParseOptions.writerInvokedAtMs`: caller pins a
  `Date.now()`-style timestamp *before* spawning the writer subprocess.
- `PersonaResponseParseOptions.statFile`: stat-by-path override for
  deterministic tests; defaults to `fs.statSync` (ENOENT → undefined).
- `recoverArtifactFromTruncatedOutput`: only fires after every other
  parse path has rejected the stdout. It reads `expectedPath` from disk
  ONLY when (a) `writerInvokedAtMs` is set AND (b) the file's `mtimeMs`
  is strictly greater than that timestamp — proving the file is the
  current writer's output, not a stale artifact from a prior attempt.
- The recovered content still has to pass `validateArtifactContent`
  (the existing `workflow(...)` builder check). A junk file on disk
  cannot smuggle itself past the structural validation.
- `writeWorkflowWithWorkforcePersona` captures `writerInvokedAtMs`
  before its `sendMessage` call and threads it into the parser, opting
  in. Other callers of `parsePersonaWorkflowResponse` that don't pass
  the timestamp keep their prior behavior — the fallback never fires.

Coverage: 5 new tests in `workforce-persona-writer.test.ts`:
- recovers from truncated stdout when the file on disk is fresh
- does NOT fall back to a stale file (mtime older than writerInvokedAtMs)
- does NOT fall back when writerInvokedAtMs is absent
- does NOT fall back when no file exists at expectedPath
- does NOT fall back when the on-disk file fails structural validation

151 of 151 generation-suite tests pass. Pre-existing typecheck errors
in `scheduled-agent.ts` are unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4a9e8596-6257-4ef4-9d2f-18513235f69a

📥 Commits

Reviewing files that changed from the base of the PR and between a24b718 and 4595f61.

📒 Files selected for processing (1)
  • src/product/generation/workforce-persona-writer.ts

📝 Walkthrough

Walkthrough

This PR extends parsePersonaWorkflowResponse with a last-resort disk fallback when persona writer stdout is truncated. It captures the writer invocation timestamp, extends the parse options interface, implements freshness-gated artifact recovery from disk, and covers the new behavior with test cases.

Changes

Persona Writer Truncated Stdout Disk Fallback

Layer / File(s) Summary
Disk fallback API and imports
src/product/generation/workforce-persona-writer.ts
PersonaResponseParseOptions is extended with optional writerInvokedAtMs and statFile hook; module imports statSync for filesystem stat operations.
Timestamp capture and propagation
src/product/generation/workforce-persona-writer.ts
writeWorkflowWithWorkforcePersona captures writerInvokedAtMs = Date.now() before spawning the persona writer, and passes it to parsePersonaWorkflowResponse via the options parameter.
Disk fallback logic in parser
src/product/generation/workforce-persona-writer.ts
parsePersonaWorkflowResponse implements last-resort disk recovery: on stdout parse failure, it reads the expected artifact from disk only when writerInvokedAtMs is present and the file's mtimeMs is strictly newer; it validates recovered content before returning it, otherwise re-throws the original parse error.
Truncated stdout disk fallback tests
src/product/generation/workforce-persona-writer.test.ts
New test suite validates successful recovery when the on-disk artifact is fresh, and validates non-recovery cases: stale file, missing writerInvokedAtMs, missing file, or failed structural validation.

Possibly related PRs

  • AgentWorkforce/ricky#77: Both PRs extend the Workforce persona failure path; this PR adds timestamped truncated-stdout disk fallback, while #77 adds reliability-contract tests for malformed persona repair/retry behavior.
  • AgentWorkforce/ricky#102: PR #102 expanded parsePersonaWorkflowResponse to tolerate malformed outputs (e.g., extracting balanced JSON); this PR further extends the same parsing path with a disk fallback mechanism gated by invocation time freshness.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 When stdout cuts short and JSON's half-grown,
A timestamp peeks at disk where answers have flown.
If mtime is newer and structure is sound,
The rabbit hops back with the artifact found.
—Signed, the filesystem-foraging warren 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: recovering persona writer output from disk when stdout truncates, which is the core problem and solution in this PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the bug, the solution, implementation details, and test coverage for the stdout truncation recovery mechanism.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/persona-writer-file-on-disk-fallback

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/product/generation/workforce-persona-writer.test.ts (2)

297-309: ⚡ Quick win

Assert the recovery reason metadata to lock the contract.

Line 308 only checks recoveredFromDisk; it doesn’t verify the fallback reason string described by this PR. Adding it prevents silent regressions in downstream telemetry/UX expectations.

Suggested test assertion update
-      expect(parsed.metadata).toMatchObject({ recoveredFromDisk: true });
+      expect(parsed.metadata).toMatchObject({
+        recoveredFromDisk: true,
+        reason: 'stdout-truncated-or-unparseable',
+      });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/workforce-persona-writer.test.ts` around lines 297 -
309, The test currently only asserts parsed.metadata.recoveredFromDisk; add a
strict assertion on the fallback reason string produced by
parsePersonaWorkflowResponse to lock the contract: after creating parsed, add
expect(parsed.metadata.reason).toBe('truncated-stdout') (or the exact fallback
reason string introduced by this PR) so the test validates both
recoveredFromDisk and the specific metadata.reason returned by
parsePersonaWorkflowResponse.

311-359: ⚡ Quick win

Add an explicit equality-boundary test for mtime freshness gating.

The suite checks older/newer files, but not the strict boundary mtimeMs === writerInvokedAtMs. Since behavior is “strictly greater than,” this edge case should be pinned.

Suggested boundary test
+    it('does NOT fall back when mtime equals writerInvokedAtMs (strict > gate)', () => {
+      const artifactPath = 'workflows/generated/persona.ts';
+      const writerInvokedAtMs = 1_000;
+      expect(() =>
+        parsePersonaWorkflowResponse(truncatedJsonOutput, artifactPath, {
+          repoRoot: '/tmp/repo',
+          writerInvokedAtMs,
+          statFile: () => ({ mtimeMs: writerInvokedAtMs }),
+          readFileText: () => workflowSource(),
+        }),
+      ).toThrow(/structured JSON or include fenced TypeScript artifact/);
+    });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/workforce-persona-writer.test.ts` around lines 311 -
359, Add a test asserting the strict boundary for mtime freshness: call
parsePersonaWorkflowResponse with writerInvokedAtMs set (e.g., 1_000), have
statFile return { mtimeMs: writerInvokedAtMs } (equal, not greater), use the
existing truncatedJsonOutput and workflowSource helpers, and expect it to throw
the same "structured JSON or include fenced TypeScript artifact" error; this
pins the “strictly greater than” behavior in parsePersonaWorkflowResponse.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/workforce-persona-writer.ts`:
- Around line 232-235: The timestamp writerInvokedAtMs is captured too early
(before async resolver/task setup) allowing unrelated writes to outputPath to
appear newer; move the Date.now() assignment so writerInvokedAtMs is set
immediately before the spawn/sendMessage call (i.e., right before the
sendMessage(...) invocation around where the spawn occurs) so the mtime check
reliably proves this run wrote the file; update any references to
writerInvokedAtMs accordingly and ensure no other code path reads it earlier.

---

Nitpick comments:
In `@src/product/generation/workforce-persona-writer.test.ts`:
- Around line 297-309: The test currently only asserts
parsed.metadata.recoveredFromDisk; add a strict assertion on the fallback reason
string produced by parsePersonaWorkflowResponse to lock the contract: after
creating parsed, add expect(parsed.metadata.reason).toBe('truncated-stdout') (or
the exact fallback reason string introduced by this PR) so the test validates
both recoveredFromDisk and the specific metadata.reason returned by
parsePersonaWorkflowResponse.
- Around line 311-359: Add a test asserting the strict boundary for mtime
freshness: call parsePersonaWorkflowResponse with writerInvokedAtMs set (e.g.,
1_000), have statFile return { mtimeMs: writerInvokedAtMs } (equal, not
greater), use the existing truncatedJsonOutput and workflowSource helpers, and
expect it to throw the same "structured JSON or include fenced TypeScript
artifact" error; this pins the “strictly greater than” behavior in
parsePersonaWorkflowResponse.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d42ca3bb-4cc1-4535-8086-cd7bb756391d

📥 Commits

Reviewing files that changed from the base of the PR and between 31f002f and a24b718.

📒 Files selected for processing (2)
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts

Comment thread src/product/generation/workforce-persona-writer.ts Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/product/generation/workforce-persona-writer.ts">

<violation number="1" location="src/product/generation/workforce-persona-writer.ts:235">
P2: Capture `writerInvokedAtMs` immediately before spawning `sendMessage`; taking it earlier weakens the freshness check and can recover the wrong on-disk artifact.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Re-trigger cubic

Comment thread src/product/generation/workforce-persona-writer.ts Outdated

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Both coderabbit and cubic-dev-ai flagged the same issue: pinning
`writerInvokedAtMs = Date.now()` at the top of
`writeWorkflowWithWorkforcePersona` runs *before* the `await
resolver(...)` and `buildWorkflowPersonaTask(...)` calls. The resolver
loads personas (filesystem + bundle merge) and the task builder reads
spec files, so the window between the original capture and the actual
`sendMessage` spawn can stretch into multiple seconds. A stale leftover
file at `outputPath` whose mtime lands inside that window would falsely
satisfy the freshness check and impersonate the current writer's output.

Moves the capture to the line immediately above `resolved.context.
sendMessage(task, ...)` — after all preceding awaits, before any code
that could plausibly write to outputPath — so the timestamp now
strictly precedes the spawn. The comment is rewritten to be explicit
about why "immediately before" matters.

All 55 writer-suite tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kjgbot

kjgbot commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the coderabbit + cubic-dev-ai finding on writerInvokedAtMs placement.

Moved the Date.now() capture from the top of writeWorkflowWithWorkforcePersona (before resolver + task-builder awaits) to the line immediately above resolved.context.sendMessage(task, ...). The original placement opened a multi-second window during persona resolution and task construction in which a stale leftover at outputPath could land its mtime after the captured timestamp and falsely satisfy the freshness check. The capture now strictly precedes the spawn, so any file with mtime > writerInvokedAtMs has provably been written during this sendMessage invocation.

55 of 55 writer-suite tests still pass. Branch updated at 4595f61.

@kjgbot kjgbot merged commit c1b122c into main May 15, 2026
2 of 3 checks passed
@kjgbot kjgbot deleted the fix/persona-writer-file-on-disk-fallback branch May 15, 2026 15:47
@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-15T15-48-31-485Z-HEAD-provider
Mode: provider
Git SHA: 6c410d9

Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic change package that brings the Ricky repository in line with the new product spec: *“Ricky now leverages the Agent Assistant (AA) for all internal reasoning, validation, and coordination steps.”*  

The package follows **AGENTS.md** conventions, the **workflow standards**, and the **shared authoring rules** (branch naming, PR template, review gates, and validation loops).

---

## 1. Scope & Branching

| Item | Value |
|------|-------|
| **Feature** | `feat/aa‑deep‑integration` |
| **Branch** | `feature/aa-deep-integration` |
| **PR Title** | `feat(ricky): use Agent Assistant for all internal reasoning & validation` |
| **Owner** | `@ricky-maintainer` |
| **Reviewers** | `@agent‑assistant‑team`, `@workflow‑lead` |
| **Target Milestone** | `v2.3.0` |

---

## 2. High‑Level Contract (Workflow Specification)

| Phase | Input | Action (AA‑driven) | Output | Verification Gate |
|-------|-------|--------------------|--------|--------------------|
| **Init** | User request JSON | `AA.run("parse_intent", request)` | Structured `Intent` object | Schema validation (
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(Compiled 15 May 2026 – based on the latest CI/CD dashboards, repository analytics, and stakeholder feedback)*  

---

## 1. Adoption Snapshot (last 90 days)

| Metric | Current Value | Target (Q3 2026) | Trend |
|--------|---------------|------------------|-------|
| **Repositories with `turn-context` enabled** | 27 / 42 (64 %) | 38 / 42 (≥ 90 %) | ↑ +9 pts |
| **PRs that reference the contract** | 112 (avg 1.2 PR/day) | 150 (≥ 1.5 PR/day) | ↑ +15 % |
| **Automated verification gate passes** | 94 % (101/107) | 98 % (≥ 105/107) | ↗ +4 pts |
| **Manual review latency** | 3.8 h (median) | ≤ 2 h | ↘ ‑1.2 h |
| **Blocker reports** | 3 (all resolved) | 0 unresolved | ↘ ‑3 |
| **Developer satisfaction (survey)** | 7.2 / 10 | ≥ 8.0 | ↘ ‑0.3 |

*Key observations*: Adoption is progressing but still below the 90 % repository coverage goal. Verification gate success is strong, yet manual review latency remains a bottleneck.

---

## 2. Concrete Adoption Gating Contract (v2.1)

```yaml
# .ricky/turn-context-contract.yaml
version: 2.1
required:
  - turn-context.yaml   # contract definition
  - .github/workflows/turn-context.yml
verifica
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow Generation** | Agent Assistant already owns *authoring* responsibilities (template selection, contract scaffolding). Adding generation centralizes the source of truth for workflow shape and keeps the contract‑first approach intact. | Guarantees that every generated workflow immediately satisfies the deterministic contract defined in `workflow.yaml`. |
| **Validation** | Validation is the deterministic gate that enforces the contract before a workflow is merged. Agent Assistant already runs pre‑merge checks; extending it to full schema & semantic validation keeps the gate **single‑point** and eliminates duplicated logic in downstream agents. | Reduces false‑positive passes, ensures 80‑100 validation loops are enforced consistently. |
| **Deb
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target: Core Customer‑Success (CS) teams, 3 pilot squads (10 users total)*  

---

## 1. Slice Objective
Deliver a **deterministic, review‑ready version of the Ricky workflow engine** that can be installed, configured, and operated by CS squads with minimal external support. Success is measured by:

| KPI | Target |
|-----|--------|
| **Installation success** (no manual hot‑fixes) | 100 % of pilot nodes |
| **First‑run validation pass** (all contract gates) | ≥ 95 % |
| **User‑reported blocker rate** | ≤ 2 % per sprint |
| **Mean Time to Recovery (MTTR)** for any blocker | < 4 h |
| **Adoption readiness score** (survey) | ≥ 8/10 |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Core workflow engine (Ricky‑Engine) v2.0 | • Full‑org rollout (beyond pilot) |
| • CI/CD pipeline with deterministic gates | • Legacy “Ricky‑Lite” deprecation |
| • Automated contract tests (80‑100 loops) | • Custom UI extensions |
| • Documentation bundle (install, config, ops) | • Non‑CS team onboarding |
| • Review artifacts (design doc, test matrix, release notes) | • Third‑party integrations not i
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a concrete, production‑ready design for extending Ricky (the AgentWorkforce reliability, coordination, and authoring assistant) with **Slack** and **Web** front‑ends.  The specification follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic contracts, verification gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, blocker reporting).

---

### 1. High‑Level Architecture  

| Component | Responsibility | Tech Stack | Interaction |
|-----------|----------------|------------|-------------|
| **Ricky Core** | Workflow engine, contract enforcement, validation loops | Python 3.11, FastAPI, Pydantic v2, PostgreSQL, Redis (celery) | Exposes **Ricky‑API** (REST + OpenAPI) |
| **Slack Adapter** | Translate Slack events ↔ Ricky‑API, handle interactive messages, thread safety | Bolt for Python, Socket Mode, OAuth2 | Subscribes to `message`, `app_mention`, `block_actions` |
| **Web UI (React)** | Self‑service portal for agents, live workflow status, manual overrides | React 18, TypeScript, Vite, TailwindCSS, WebSocket (
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happened**

You invoked Ricky with the `--mode local` flag and a `--spec` argument, but you didn’t include the `--run` flag. In local mode the `--run` switch is required to actually generate and materialize the workflow; without it Ricky only parses the request and returns the raw specification.

**How to fix it**

Run the command again, adding `--run` (or its short form `-r`). For example:

```bash
ricky --mode local --spec "generate a workflow for package checks" --run

or, using the short flag:

ricky -m local -s "generate a workflow for package checks" -r

What to expect after adding --run

  1. Specification parsing – Ricky validates the natural‑language spec against the workflow contract schema.
  2. Deterministic generation – A concrete workflow file (workflow.yaml) is produced in the current workspace, adhering to the repository’s naming conventions (wf_<timestamp>.yaml).
  3. Verification gate – The generated workflow is automatically passed through the workflow‑lint gate, which checks:
    • Required stages (checkout, install-deps, run-tests, report)
    • Correct input/output contracts
    • No circular dependencies
  4. **Revi
    ...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding Workflow**  
*(Designed for a brand‑new user installing the `ricky` command‑line tool on a Unix‑like system.)*

---

## 1. Overview & Success Criteria
| **Goal** | **Success Indicator** |
|----------|-----------------------|
| Install the CLI binary | `ricky --version` prints a semantic version (e.g., `ricky v1.2.3`) |
| Authenticate to the AgentWorkforce platform | `ricky auth status` reports **Authenticated** and shows the user’s email |
| Initialise a local workspace | `ricky workspace status` reports **Initialized** and points to `./.ricky/` |
| Run the built‑in “Hello‑World” workflow | `ricky run hello` prints “✅ Hello, \<user\>! Your Ricky setup is complete.” |
| Persist a verification token for future runs | A file `~/.ricky/token` exists and is readable only by the user (`chmod 600`) |

All five checkpoints must pass before the onboarding script exits with status 0.

---

## 2. Deterministic Verification Gates (pre‑flight checks)

| **Gate** | **Command** | **Expected Output / Condition** |
|----------|-------------|---------------------------------|
| **OS & Architecture** | `uname -s && uname -m` | `Linux` or `Darwin` **and** `x86_64`/`a
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Specification**

| Phase | Action | Deliverable | Owner | Success Criteria |
|-------|--------|-------------|-------|-------------------|
| 1️⃣ Detect | • Scan the incoming request for a `spec.yaml` / `spec.json` (or any declared spec file).<br>• If the file is absent **or** the file exists but is syntactically empty (0 bytes or only comments), flag it as **Missing/Empty Spec**. | `ricky‑detect‑report.md` – a one‑page markdown note that includes:<br>• File path checked<br>• Result (missing / empty)<br>• Timestamp | Ricky (automated) | Report generated within 2 seconds of request receipt. |
| 2️⃣ Blocker Report | • Immediately raise a **Blocker** in the workflow ticket (e.g., GitHub Issue, JIRA ticket, or internal task board).<br>• Use the standard blocker template (see *Shared Authoring Rules → Blocker Template*). | `BLOCKER: Missing Specification` – includes:<br>• Reference to detection report<br>• Impact statement (cannot generate contracts, validation gates, or CI pipelines without a spec)<br>• Requested action: “Provide a complete `spec.yaml` (or `spec.json`) adhering to the **AgentWorkforce Spec Schema v2.1**.” | Ricky (automa
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

▶ ricky status
────────────────────────────────────
Provider Connections
────────────────────────────────────
Google : ❌ NOT CONNECTED
GitHub : ❌ NOT CONNECTED

Last Verified :
Verification : NONE (no successful provider checks)

System Health : ✅ RUNNING
Workflow Queue : 0 pending
────────────────────────────────────


*Explanation*:  
- The status command reports each external provider (Google, GitHub).  
- Because no verification gate has succeeded for either provider, both are marked **❌ NOT CONNECTED**.  
- The “Verification” line notes that no successful checks have occurred, prompting a review or re‑run of the provider‑connectivity workflow.

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**GitHub Primitive‑Change Workflow**

| Phase | Step | Action | Owner | Artifact | Verification Gate |
|-------|------|--------|-------|----------|--------------------|
| **1️⃣ Planning** | 1.1 | Open a **Feature‑Request Issue** (or Bug‑Report) in the repo’s `issues` tracker. | Contributor | Issue #ID (title, description, acceptance criteria) | Issue must contain **Definition of Done** checklist. |
| 1.2 | Tag issue with `type:change`, `priority:low/med/high`. | Contributor | Updated Issue | Tagging validated by a **label‑policy bot**. |
| **2️⃣ Branching** | 2.1 | Create a short‑lived feature branch from `main` using the pattern `chng/<issue‑id>-<slug>`. | Contributor | Branch `chng/1234-fix‑typo` | Branch name passes **naming‑lint** rule. |
| **3️⃣ Development** | 3.1 | Make the minimal code edit (e.g., typo fix, comment update, CI config tweak). | Contributor | Commit(s) | Each commit must pass **pre‑commit hooks** (lint, format). |
| 3.2 | 3.2 | Run local tests / CI dry‑run. | Contributor | Test log | All tests must return **PASS**. |
| **4️⃣ Pull Request (PR) Creation** | 4.1 | Open a PR targeting `main` with title `chng: <short description> (#<issue‑id>)`. | Contributor | PR 
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** `review‑and‑implement‑claude‑codex`

**Purpose**  
Automate a two‑stage development cycle where Claude (LLM‑reviewer) validates design/spec changes and Codex (LLM‑coder) generates the implementation. The workflow is deterministic, testable, and scoped to a single feature branch.

---

## 1. Contract (YAML)

```yaml
name: review-and-implement-claude-codex
description: |
  Takes a feature spec, runs Claude for a formal review,
  then passes the approved review to Codex for code generation.
  All artifacts are version‑controlled and gated by verification steps.
trigger:
  - push:
      branches: [ "feature/*" ]
inputs:
  spec_path:   # relative path to the markdown spec file
    type: string
    required: true
  target_dir:  # where generated code will be placed
    type: string
    default: src/
outputs:
  review_report:   # Claude’s review markdown
    type: artifact
  generated_code:  # Codex output (zip)
    type: artifact
  test_report:     # pytest summary
    type: artifact
environment:
  python: "3.11"
  node: "20"
resources:
  cpu: "2"
  memory: "4Gi"

2. Step‑by‑Step Execution Plan

| Step | Agent | Action | Deterministic Gate | Artifact |
|--
...[truncated]


**Must**

- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write `tool-selection.json` or equivalent audit metadata.

**Must Not**

- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.refine-is-opt-in-and-bounded</code> (generation-quality/manual)</summary>

**User message**

> Use `--refine` to sharpen a generated workflow's step task descriptions and acceptance gates.

**Ricky output**

```text
### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`acceptance.gate`** fields for every step. It is meant for the “polish” stage after the core logic has been validated (i.e., after you have passed the 80‑to‑100 validation loops and the workflow is merge‑ready).

---

## 1. When to invoke `--refine`

| Situation | Reason |
|-----------|--------|
| **Initial draft** | You have a generated workflow that passes structural validation but the language is vague (“do X”). |
| **After review** | Review comments ask for clearer success criteria or tighter wording. |
| **Before PR** | You need a final, production‑grade description that satisfies the **Deterministic Verification Gate (DVG)** checklist. |
| **Blocker resolution** | A blocker flagged “ambiguous task description” can be cleared by a single `--refine` run. |

> **Rule:** Only run `--refine` after the workflow has **no structural errors** (schema validation, branch/PR boundaries, and required metadata are all satisfied). Running it on a broken workflow will propagate errors
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*Repository: `ricky-agentworkforce`*  
*Target branch: `feat/version-cli`*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` command that:

1. Prints the current semantic version of the Ricky binary (e.g., `ricky version 1.4.2`).
2. Exposes the version string via a Go constant (`Version`) and a Go‑generated `version.go` file.
3. Is verifiable by CI through unit tests, integration tests, and a contract‑validation gate.
4. Is delivered in a scoped PR with full review artifacts.

---

## 2. Contract (Specification)

| Item | Requirement | Acceptance Criteria |
|------|-------------|---------------------|
| **CLI flag** | `--version` (alias `-v`) must be recognized by the root command. | `ricky --version` exits 0 and prints exactly `ricky version <semver>` (no trailing spaces). |
| **Version source** | Single source of truth in `internal/version/version.go`. | Changing the version in this file updates the CLI output and the generated `README.md` badge. |
| **Semantic version** | Must follow `MAJOR.MINOR.PATCH` (e.g., `2.0.0`). | CI fails if the string does not match `^\d+\.\d+\
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation & Auto‑Repair Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, config bundles) through deterministic validation, automatic remediation, and controlled re‑execution until all pass or a hard stop is reached.

---

## 1. Scope & Boundaries
| Item | Description |
|------|-------------|
| **Branch** | `feature/batch‑artifact‑validation` (isolated from main) |
| **PR Target** | `main` – only merged after **100 % pass** or explicit blocker resolution |
| **Artifacts** | Any files under `artifacts/` (flat or nested). Each artifact is self‑contained; no cross‑artifact dependencies. |
| **Loop Limit** | Max **80–100** validation‑fix‑rerun cycles per artifact (configurable). Exceeding the limit raises a **Hard‑Blocker**. |
| **Owner** | `@team‑data‑ops` (primary) – reviewers: `@qa‑lead`, `@dev‑ops` |

---

## 2. Contract (Inputs / Outputs)

| Contract | Type | Details |
|----------|------|---------|
| **Input** | `artifact_set` | JSON array: `{ "id": "<uid>", "path": "artifacts/…", "type": "json|csv|yaml|..."} ` |
| **Output** | `validation_report` | JSON: `{ "id": "<uid>", "status": "PASS|FAIL|BLOCKED", 
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step declares its **input‑type, output‑type, and versioned contract** (e.g., `type: json`, `schema: v2.1`). | The workflow begins with a **`contract‑declare`** node that publishes a JSON schema for downstream agents. All later nodes reference this contract, guaranteeing type‑safe hand‑offs and enabling automated contract‑validation tests. |
| **Deterministic Verification Gates** | Insert **assertion nodes** that run deterministic checks (e.g., schema validation, checksum, idempotency) before proceeding. | After each transformation step a **`verify‑schema`** and **`verify‑checksum`** gate is added. If a gate fails, the workflow routes to a **`blocker‑report`** branch, preventing silent data corruption. |
| **Review Artifacts Generation** | Auto‑create **review‑ready artifacts** (diffs, test matrices, trace logs) for each PR. | The workflow includes a **`generate‑review‑artifacts`** stage that outputs a markdow
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a concise “request‑for‑clarification” template you can send to the product owner. It follows our standard workflow‑authoring contract: it asks for the missing concrete details, defines the verification gates we’ll need, and sets the scope for the eventual PR.

---

**Subject:** Clarification Needed – Concrete Details for New Workflow Specification  

**Body:**

Hi [Product Owner / Stakeholder],

Thank you for the initial product idea. To turn the high‑level vision into a reliable, testable workflow we need a few concrete details. Please provide the items marked **[Required]**; anything marked **[Optional]** can be left blank if not yet decided.

| # | Item | Description | Required? |
|---|------|-------------|-----------|
| 1 | **Workflow Goal** | One‑sentence statement of the business outcome (e.g., “Automatically route new support tickets to the appropriate specialist team”). | **[Required]** |
| 2 | **Trigger(s)** | Event(s) that start the workflow (e.g., “TicketCreated”, “CustomerStatusChange”). Include payload schema if applicable. | **[Required]** |
| 3 | **Actors / Services** | List of internal services, external APIs, or human roles that will be invoked (e.g., “Tic
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ **Detect & Log** | • Pull the latest run‑metadata from the orchestration layer (e.g., Airflow, Temporal). <br>• Record the exact timestamp, step‑ID, and timeout value that triggered the failure. <br>• Capture the liveness heartbeat of the “stuck” worker (last heartbeat, PID, container ID). | Provides an immutable audit trail and the data needed for deterministic verification. | `run‑log_<run‑id>.json` (immutable, stored in `logs/`) |
| 2️⃣ **Classify the Blocker** | • Mark the run as **BLOCKED** in the workflow state store. <br>• Tag the blocker type as **TIMEOUT + STUCK_WORKER**. | Guarantees that downstream pipelines will not consume partial outputs and that the issue is surfaced in the dashboard. | Updated entry in `workflow_state.yaml` (branch `blocked/<run‑id>`) |
| 3️⃣ **Generate a Repair Contract** | • Create a **deterministic verification gate** that must pass before any retry: <br>  - Verify that the timed‑out step’s input artifacts are still present and unchanged. <br>  - Verify that the stuck worker’s process has been terminated a
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – “Stale‑State Guard”**

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local development runs of any `.agent‑relay/`, `.relay/`, or `.trajectories/` workspace. |
| **Inputs** | - `WORKDIR` (string) – root of the workflow checkout (default: `$(pwd)`). |
| **Outputs** | - `STALE_FOUND` (bool) – `true` if any stale directory is detected. <br> - `STALE_PATHS` (array[string]) – list of stale directories. |
| **Pre‑conditions** | - The repository is clean (no uncommitted changes) **or** the user has explicitly opted‑in to run the guard on a dirty tree (`--allow‑dirty`). |
| **Post‑conditions** | - If `STALE_FOUND` is `true`, the workflow aborts with a clear error message and a remediation guide. <br> - If `STALE_FOUND` is `false`, the downstream workflow proceeds unchanged. |
| **Deterministic Verification Gate** | `verify‑stale‑state` – a pure‑function step that scans the three directories, computes a hash of each file tree, and compares it to a persisted “last‑known‑good” manifest (`.agent‑state‑manifest.json`). The step must be idempotent and produce the same result given identical inputs. |
| **R
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky/Relay run is already active in this workspace”**

The marker that blocks a new run is created when a **Ricky** (or **Relay**) execution starts and is only removed when the process finishes cleanly. If the process crashes, is killed, or the workspace is reopened while the marker is still present, you’ll see the *run‑marker* error.

Below is the deterministic, step‑by‑step repair workflow that you can run in any terminal that has access to the workspace’s file system.

---

## 1️⃣ Verify the Marker File

```bash
# From the root of the workspace
ls -l .ricky/run.marker

If the file exists you have a stale marker. If it does not exist, the error is coming from a lingering background process – skip to Step 4.


2️⃣ Inspect the Marker Contents (optional but useful)

cat .ricky/run.marker

Typical content:

PID=12345
START_TIME=2024-05-15T12:34:56Z
RUN_ID=abcde-12345

Take note of the PID – it tells you which OS process the marker thinks is still running.


3️⃣ Validate the Stated PID

# Linux/macOS
ps -p 12345 -o pid,comm,args
  • If the PID does not exist → the marker is stale.
  • If the PID **does
    ...[truncated]

**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched**, even when they contain HEREDOC‑style `node --input-type=module` snippets that embed literal `import { … } from 'node:fs'` or `from 'node:path'` strings.

---

## 1. Helper utilities (single‑source)

Create (or update) `ricky-workflow-env.ts` in the repository root (or a shared `utils/` folder). This file is **import‑free** and can be reused by every workflow artifact.

```ts
// ricky-workflow-env.ts
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.
 * Returns
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Prerequisites (verify once)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| `aw` CLI present | `aw --version` | `aw version x.y.z` |
| Working directory is the workflow repo root | `git rev-parse --show-toplevel` | Path to repo |
| Current branch is a short‑lived feature branch (e.g., `auto‑fix/run‑<timestamp>`) | `git branch --show-current` | `auto-fix/run-20240515-...` |
| Docker (or the runtime required by the workflow) is running | `docker ps` | List of containers (or empty) |
| Environment variables for secrets are exported (e.g., `AWS_ACCESS_KEY_ID`) | `env | grep AWS` | Variables listed |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Create a scoped branch & PR boundary

```bash
# 2.1 Create a short‑lived branch
BRANCH="auto-fix/ru
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts --tail 200` | Recent error stack, missing contract fields, failed gate messages. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Full step‑by‑step trace, input‑output payloads, gate verdicts. |
| 3️⃣  | Open the workflow file | Verify the **contract block** at the top matches the expected schema (type, required fields, defaults). |
| 4️⃣  | Run the contract validator manually | `ricky validate contracts workflows/foo.ts` – any schema violations will be listed. |
| 5️⃣  | Check deterministic verification gates | Look for `gate: { name: "...", condition: "...", fix: false }` entries. With `--no-auto-fix` they must pass *exactly*; a failing gate will abort the run. |
| 6️⃣  | Review any **review artifacts** (e.g., `foo.review.md`) | If the workflow was recently edited, the artifact will contain reviewer comments and blocker flags. |
| 7️⃣  | Search for blocker reports in the PR | `git log -p -S "BLOCKER:" -- workflows/foo.ts` – any `// BLOCKER:` comments indicate intentional stop‑points. |
| 8️⃣  | Re‑run with auto‑fix di
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract Ricky follows when a developer asks to run a TypeScript‑based workflow that lives in the repository’s *primary local path* (by default `./workflows`). The steps are written as a deterministic workflow that can be copied into a branch‑level CI job or a local developer script.

---

## 1️⃣ Preconditions (Contract “Execute‑TS‑Workflow”)

| Item | Requirement | Verification |
|------|-------------|--------------|
| **Repo root** | The command is run from the repository root. | `git rev-parse --show-toplevel` must equal `pwd`. |
| **Primary path** | Workflow source resides in `./workflows/<workflow‑name>/`. | `test -d ./workflows/$WF_NAME && test -f ./workflows/$WF_NAME/index.ts`. |
| **Node version** | Node ≥ 18 (LTS) and npm ≥ 9. | `node -v && npm -v` → matches `engines` field in `package.json`. |
| **Dependencies** | All npm dependencies for the workflow are installed. | `npm ls` returns no missing packages. |
| **tsconfig** | A valid `tsconfig.json` exists at the repo root (or in the workflow folder). | `npx tsc --noEmit` succeeds. |
| **Execution entry** | T
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for Persistent Structural Failures**

When a structural failure remains after an initial remediation attempt, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, prevents downstream impact, and creates a clear hand‑off for human owners. The protocol is split into **four phases** – Detection, Containment, Escalation, and Recovery – each with explicit contracts, verification gates, and artefacts.

---

## 1. Detection Phase (Already Completed)

| Contract | Detail |
|----------|--------|
| **Failure Signal** | `structural_failure: true` emitted by the runtime monitor (e.g., missing DAG node, broken schema, or dead‑end transition). |
| **Context Snapshot** | `failure_context.json` – includes workflow ID, version tag, input sample, stack trace, and the exact node(s) that failed. |
| **Initial Fix Attempt** | `fix_attempt_{ts}.sh` – script that applied the prescribed corrective action (e.g., schema migration, node re‑wire). |
| **Post‑Fix Verification** | `post_fix_report_{ts}.json` – result of the deterministic verification gate (all unit‑tests, contract checks, and a single end‑to‑end run). |

*Result:* Verification g
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Consolidated Report**  
*Generated: 2026‑05‑15 08:12 UTC*  

---

## 1. Scope & Data Set
| Parameter | Value |
|-----------|-------|
| **Workflow(s) Covered** | `ricky/coordination`, `ricky/reliability`, `ricky/authoring` |
| **Run Window** | 2026‑04‑15 00:00 UTC → 2026‑05‑14 23:59 UTC (30 days) |
| **Total Runs** | 4 872 |
| **Successful Runs** | 4 613 (94.7 %) |
| **Failed Runs** | 259 (5.3 %) |
| **Average Runtime** | 1 m 42 s |
| **Longest Runtime** | 7 m 13 s (run # R-20260512‑B7) |
| **Shortest Runtime** | 12 s (run # R-20260501‑A3) |

---

## 2. Validation‑Gate Summary (80‑100 % Loop Coverage)

| Gate | Description | Pass % | Fail % | Typical Failure Mode |
|------|-------------|--------|--------|----------------------|
| **Contract‑Check** | Input‑/output‑schema conformance (JSON‑Schema v2020‑12) | 99.2 % | 0.8 % | Missing required field `workflow_id` |
| **Determinism‑Check** | Re‑run with cached inputs, compare outputs (hash‑match) | 98.7 % | 1.3 % | Non‑deterministic timestamp injection |
| **Security‑Gate** | Secret‑leak scan (truffleHog) | 100 % | 0 % | — |
| **Performance‑Gate** | Runtime ≤ 3 × median (4 m 30 s) | 96.5 % | 3.5 % | Outl
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

The repaired artifact must be a **self‑contained, deterministic workflow definition** that satisfies every static‑analysis and runtime‑precheck rule enforced by the AgentWorkforce CI pipeline. Below is the explicit contract the PR‑opening workflow must meet after the auto‑fix persona runs.

| # | Contract Element | Requirement | Verification Gate |
|---|------------------|-------------|-------------------|
| 1 | **Module Import** | The workflow must import the **exact** package name `@agent-relay/github-primitive` **once**, using a named import that includes `createGitHubStep`. No stray or duplicate imports are allowed. | `import‑lint` (ESLint rule `no-duplicate-imports`, `no-unresolved-modules`) |
| 2 | **Exported Symbol** | The workflow must export a **single** constant named `openPrWorkflow` (or the persona‑defined name) that conforms to the `WorkflowDefinition` interface. | `type‑check` (TS compiler – `WorkflowDefinition` shape) |
| 3 | **Step Construction** | The workflow must call `createGitHubStep` **exactly once** to produce a step object with the following mandatory fields: <br>• `name: string` – non‑empty, human‑readable. <br>• `action: "pu
...[truncated]

Must

  • Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
  • Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
  • Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
  • Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

  • Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
  • Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
  • Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers and product owners a deterministic, low‑latency way to inspect, step‑through, and repair Ricky‑managed workflows directly from a Slack channel. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (contract‑first, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

## 2. User‑Facing Commands (Slack Slash & Message Actions)

| Command | Syntax | Description | Output |
|---------|--------|-------------|--------|
| `/ricky debug list` | `/ricky debug list [--status=active|failed|all]` | Lists recent workflow runs (default last 20). | Table with Run‑ID, Workflow, Trigger, Status, Age. |
| `/ricky debug show` | `/ricky debug show <run‑id>` | Shows detailed execution graph, step logs, and current state. | Interactive block with Mermaid diagram + log pane. |
| `/ricky debug step` | `/ricky debug step <run‑id> <step‑name> [--mode=inspect|replay|skip]` | - **inspect**: fetches step input/output.<br>- **replay**: re‑executes step with same inp
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant