Skip to content

fix(persona-writer): recover placeholder ts-fence responses via disk fallback#124

Merged
khaliqgant merged 2 commits into
mainfrom
fix/writer-parser-disk-fallback
May 18, 2026
Merged

fix(persona-writer): recover placeholder ts-fence responses via disk fallback#124
khaliqgant merged 2 commits into
mainfrom
fix/writer-parser-disk-fallback

Conversation

@kjgbot

@kjgbot kjgbot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Summary

The writer's response parser had a gap in the fenced-response path. When claude is prompted to use the Write tool and only emit a placeholder comment inside the ```typescript fence (e.g. `// (full source above — file written to disk)`), the existing parser:

```ts
if (metadataJson && tsFence) {
return validateFencedResponse(tsFence, metadataJson, expectedPath);
}
```

threw `"does not call workflow()"` because the placeholder text lacks a `workflow(` call. The downstream truncated-stdout disk-recovery fallback (line ~1070) never ran, because both fences had successfully parsed.

Concrete regression this fixes

In our 4-spec ricky-cloud-spawn re-run on v0.1.69 (after the master-renderer fix in #123 landed), the writer produced a real 513-line workflow with proper top-level `GitHubStepExecutor`/`createGitHubStep` imports for pr-08 and wrote it to disk via the Write tool in ~25 minutes. Ricky then rejected the response as `parse-error` and burned 3 more 25-minute repair attempts on output it could not parse — none of which had any chance of succeeding, since the only thing wrong was the stdout-format mismatch. The underlying file on disk was always fine.

The fix

Try `validateFencedResponse` first, and only on its specific `"does not call workflow()"` symptom, attempt `recoverArtifactFromTruncatedOutput` (mtime-guarded, same as the existing truncated-stdout path) before giving up. Other failures (metadata path mismatch, malformed metadata, etc.) still surface unchanged. Both fenced-response code paths (json-fence + ts-fence at ~1031, and ts-fence + metadata-fence at ~1042) now flow through the same `tryFencedResponseOrDiskRecovery` helper for consistency.

Tests

  • New `recovers from a placeholder typescript fence + complete json metadata fence` — pins the exact regression shape (placeholder text in ts fence + structured json with `writtenToDisk: true` metadata) producing a `fenced-artifact` response sourced from the on-disk file, marked `recoveredFromDisk: true, reason: 'fenced-ts-placeholder'`.
  • New `placeholder fence path respects the freshness guard` — locks the mtime guard: without `writerInvokedAtMs`, recovery short-circuits and the parser falls through to its usual error paths.
  • New `does NOT recover from a placeholder fence when on-disk content also lacks workflow()` — no silent bypass: if the on-disk file is also invalid, the original parser-failure surface is preserved.

All 73 writer tests pass; bundle clean. The 12 pre-existing entrypoint-tempdir flakes (race on `/tmp/ricky-sdk-timeout-repo-*/child.pid` across parallel test files) are unrelated to this change — see CI for re-runs.

Test plan

  • Cut a release (suggest v0.1.70), install globally
  • Re-run `scripts/ricky-cloud-spawn.sh --only pr-08,pr-09,pr-10,pr-11` and confirm writer responses with placeholder ts fences are now accepted via disk recovery instead of triggering the repair loop

🤖 Generated with Claude Code

…a responses by reading freshly-written workflow from disk

The writer's response parser had a gap in the fenced-response path. When
claude was prompted to use the Write tool and only emit a placeholder
comment inside the ```typescript fence (e.g. `// (full source above —
file written to disk)`), the existing path:

  if (metadataJson && tsFence) {
    return validateFencedResponse(tsFence, metadataJson, expectedPath);
  }

threw `"does not call workflow()"` because the placeholder text lacks a
`workflow(` call. The downstream truncated-stdout disk-recovery fallback
(line ~1070) never ran, because both fences had successfully parsed.

Concrete regression: in our 4-spec ricky-cloud-spawn re-run on v0.1.69
(after the master-renderer fix landed), the writer produced a real
513-line workflow with proper top-level
`GitHubStepExecutor`/`createGitHubStep` imports for pr-08 and wrote it
to disk via the Write tool in ~25 minutes. Ricky then rejected the
response as parse-error and burned 3 more 25-minute repair attempts on
output it could not parse — none of which had any chance of succeeding,
since the only thing wrong was the stdout-format mismatch. The
underlying file on disk was always fine.

The fix: try `validateFencedResponse` first, and only on its specific
"does not call workflow()" symptom, attempt
`recoverArtifactFromTruncatedOutput` (mtime-guarded, same as the
existing truncated-stdout path) before giving up. Other failures
(metadata path mismatch, malformed metadata, etc.) still surface
unchanged. Both fenced-response code paths (json-fence + ts-fence at
~1031, and ts-fence + metadata-fence at ~1042) now flow through the
same `tryFencedResponseOrDiskRecovery` helper for consistency.

Tests:

- `recovers from a placeholder typescript fence + complete json metadata
  fence` — pins the exact regression shape (placeholder text in ts fence
  + structured json with `writtenToDisk: true` metadata) producing a
  `fenced-artifact` response sourced from the on-disk file, marked
  `recoveredFromDisk: true, reason: 'fenced-ts-placeholder'`.
- `placeholder fence path respects the freshness guard` — locks the
  mtime guard: without `writerInvokedAtMs`, recovery short-circuits
  and the parser falls through to its usual error paths.
- `does NOT recover from a placeholder fence when on-disk content also
  lacks workflow()` — no silent bypass: if the on-disk file is also
  invalid, the original parser-failure surface is preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 54aaac5e-d19b-4c76-8815-c67a9a8cba8e

📥 Commits

Reviewing files that changed from the base of the PR and between f41c550 and e2b9e3d.

📒 Files selected for processing (2)
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts

📝 Walkthrough

Walkthrough

parsePersonaWorkflowResponse now attempts fenced TypeScript + JSON metadata validation and, on the specific "does not call workflow()" failure, tries to recover the workflow source from disk (honoring writerInvokedAtMs). Successful recoveries return a fenced-artifact annotated with recoveredFromDisk and reason.

Changes

Placeholder-fence recovery with disk fallback

Layer / File(s) Summary
Placeholder-fence recovery helper and callsites
src/product/generation/workforce-persona-writer.ts
Adds tryFencedResponseOrDiskRecovery, intercepts the placeholder "does not call workflow()" error from validateFencedResponse, attempts recoverArtifactFromTruncatedOutput with freshness checks, and, on success, revalidates and returns the recovered fenced-artifact annotated with recoveredFromDisk: true and reason: 'fenced-ts-placeholder'. Two callsites in parsePersonaWorkflowResponse now invoke this helper.
Placeholder-fence recovery regression tests
src/product/generation/workforce-persona-writer.test.ts
Adds a placeholderFenceOutput fixture and tests: successful disk recovery when fresh; suppression of recovery when artifact mtime is stale or writerInvokedAtMs is omitted; rejection when recovered on-disk content lacks workflow structure; and validation that fenced metadata mismatches still throw after recovery.

Sequence Diagram

sequenceDiagram
  participant LLMOutput as Claude Output (fenced-ts + metadata)
  participant Parser as parsePersonaWorkflowResponse
  participant Recovery as tryFencedResponseOrDiskRecovery
  participant Validator as validateFencedResponse
  participant Disk as recoverArtifactFromTruncatedOutput

  LLMOutput->>Parser: provides tsFence + metadata
  Parser->>Recovery: forward tsFence, metadata, expectedPath, options
  Recovery->>Validator: validate fenced response
  Validator-->>Recovery: WorkforcePersonaWriterError 'does not call workflow()'
  Recovery->>Disk: recoverArtifactFromTruncatedOutput (uses writerInvokedAtMs mtime check)
  Disk-->>Recovery: recovered artifact content
  Recovery->>Validator: validate recovered content
  Validator-->>Recovery: valid fenced-artifact
  Recovery-->>Parser: return fenced-artifact with recoveredFromDisk: true, reason: 'fenced-ts-placeholder'
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AgentWorkforce/ricky#112: Extends existing disk-recovery mechanism (added in #112 via writerInvokedAtMs freshness gating) to specifically handle Claude "placeholder" fenced TypeScript artifacts.
  • AgentWorkforce/ricky#102: Related to earlier persona response parsing and tolerant extraction work in the same writer module.
  • AgentWorkforce/ricky#77: Adds tests and recovery behaviors for malformed or placeholder-style Workforce persona outputs.

Poem

🐰 I saw a fenced TypeScript shell, so meek,
The disk held truth the LLM failed to speak.
Freshness checked, the workflow rose anew,
Recovered, stamped, and flagged with reasons true.
Hop—now tests can prove the rabbit knew.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely summarizes the main change: a fix to recover placeholder TypeScript fence responses through a disk fallback mechanism in the persona-writer module.
Description check ✅ Passed The pull request description is comprehensive and directly related to the changeset, providing context about the regression, the specific fix implemented, and the tests added to verify the behavior.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/writer-parser-disk-fallback

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/workforce-persona-writer.test.ts`:
- Around line 411-432: The test title says it should exercise the stale-mtime
branch but it never passes writerInvokedAtMs or a statFile result; update the
test that calls parsePersonaWorkflowResponse (using placeholderFenceOutput and
artifactPath) to either rename the test to reflect the "missing
writerInvokedAtMs" behavior OR supply a writerInvokedAtMs and a mocked statFile
that returns mtimeMs <= writerInvokedAtMs (e.g. legacy mtime older than the
writerInvokedAtMs) so the placeholder-fence freshness guard in
parsePersonaWorkflowResponse is triggered; ensure you reference the same
artifactPath and preserve the expectation that the parser throws the same
parser-failure error.

In `@src/product/generation/workforce-persona-writer.ts`:
- Around line 1347-1363: The code currently swallows errors from
validateMetadata after disk recovery, allowing empty or wrong fenced metadata to
succeed; change the catch so validation failures abort recovery: after
validateArtifactContent(recovered) succeeds, call validateMetadata(metadata) and
if it throws return undefined (i.e., treat metadata validation errors as a
disk-recovery failure), ensuring the returned object
(content/metadata/responseFormat) is only returned when both
validateArtifactContent and validateMetadata pass for the recovered content and
metadata.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2480fb6e-47d1-448b-b7f9-130b6c86d51e

📥 Commits

Reviewing files that changed from the base of the PR and between db12bed and f41c550.

📒 Files selected for processing (2)
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts

Comment thread src/product/generation/workforce-persona-writer.test.ts Outdated
Comment thread src/product/generation/workforce-persona-writer.ts Outdated

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines 1039 to 1043
const recovered = tryFencedResponseOrDiskRecovery(tsFence, metadataJson, expectedPath, options);
if (recovered) return recovered;
}
if (metadataJson) {
return validateStructuredResponse(metadataJson, expectedPath, 'structured-json', options);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Placeholder-fence fallthrough to validateStructuredResponse bypasses the mtime freshness guard, silently accepting stale artifacts

When tryFencedResponseOrDiskRecovery returns undefined (because recoverArtifactFromTruncatedOutput correctly rejected a stale on-disk file whose mtimeMs <= writerInvokedAtMs), the code falls through to validateStructuredResponse(metadataJson, ...) at line 1042-1043. That function calls recoverExpectedArtifactContent (workforce-persona-writer.ts:1248), which reads the file from disk without any mtime freshness check (workforce-persona-writer.ts:1276-1297). This silently returns a stale artifact from a prior writer run as the current writer's output.

In the old code, validateFencedResponse would either return or throw, so validateStructuredResponse was unreachable when both metadataJson and tsFence were present. The new code makes it reachable whenever disk recovery is rejected by the freshness guard.

Concrete stale-file scenario
  1. Previous writer run wrote a valid workflow to workflows/generated/persona.ts (mtime = 500ms).
  2. New writer run starts (writerInvokedAtMs = 1000), emits a placeholder ts fence + json metadata.
  3. tryFencedResponseOrDiskRecovery: recoverArtifactFromTruncatedOutput sees mtimeMs(500) <= 1000 → returns undefined (correctly rejects stale file).
  4. tryFencedResponseOrDiskRecovery returns undefined.
  5. Falls through to validateStructuredResponse(metadataJson, ...)recoverExpectedArtifactContent reads the same stale file (no mtime check) → returns it.
  6. Parser returns the stale workflow content as if it were the current writer's output.

The existing test at line 411-431 masks this by omitting readFileText, so recoverExpectedArtifactContent can't find any file. A test providing both writerInvokedAtMs and readFileText with a stale statFile mtime would demonstrate the bypass.

(Refers to lines 1039-1044)

Prompt for agents
The problem is that when tryFencedResponseOrDiskRecovery returns undefined (because the freshness guard in recoverArtifactFromTruncatedOutput correctly rejected a stale file), the code falls through to validateStructuredResponse which calls recoverExpectedArtifactContent — a function that reads from disk without any mtime freshness check. This silently accepts stale artifacts.

The fix should prevent the fallthrough from the placeholder-fence recovery path to validateStructuredResponse when both metadataJson and tsFence are present. In the old code, validateFencedResponse either returned or threw, so validateStructuredResponse was unreachable when both were truthy.

Approach 1 (minimal): Change the if-else structure so that when metadataJson && tsFence is true and tryFencedResponseOrDiskRecovery returns undefined, don't fall through to the metadataJson-only validateStructuredResponse branch. Use else-if instead of two independent ifs.

Approach 2 (broader): Propagate the writerInvokedAtMs freshness check into recoverExpectedArtifactContent so it also respects the mtime guard when provided. This is a more thorough fix but touches the pre-existing recovery path too.

Affected files: src/product/generation/workforce-persona-writer.ts, lines 1031-1044 (first call site) and potentially lines 1050-1053 + 1063-1067 (second call site where extractFirstBalancedJsonObject can find the same JSON and call validateStructuredResponse).

Also add a test in the placeholder fence test group that provides writerInvokedAtMs, a stale statFile mtime, and a readFileText returning valid workflow source — it should throw, not silently return the stale content.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/product/generation/workforce-persona-writer.ts">

<violation number="1" location="src/product/generation/workforce-persona-writer.ts:1346">
P1: Do not return `undefined` when placeholder disk recovery fails here; it allows fallthrough into unguarded structured-json disk loading and can accept stale artifacts.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Fix all with cubic | Re-trigger cubic

Comment thread src/product/generation/workforce-persona-writer.ts Outdated
@github-actions

github-actions Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-18T07-50-08-536Z-HEAD-provider
Mode: provider
Git SHA: 8d0ee68

Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic workflow for delivering the requested change. It follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules) and includes all required artifacts, verification gates, and review checkpoints.

---

## 1. Scope & Objectives
| Item | Description |
|------|-------------|
| **Target** | `README.md`, `docs/architecture.md`, and the `ricky/__init__.py` module. |
| **Goal** | Make it explicit that Ricky **leverages the Agent Assistant (AA) platform for workflow orchestration, verification, and self‑healing**. |
| **Boundaries** | No functional code changes beyond the public constant `RICKY_USES_AGENT_ASSISTANT = True`. No impact on existing CI pipelines. |
| **Exclusions** | UI assets, external packaging scripts, and any non‑Python language bindings. |

---

## 2. Concrete Workflow Contract

| Phase | Deliverable | Deterministic Gate | Owner |
|-------|-------------|--------------------|-------|
| **A – Branch Creation** | `feature/aa‑deep‑integration` from `main` | Branch must be created with prefix `feature/` and include t
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑18)**  

| Metric | Current State | Target / Desired State | Gap / Action |
|--------|---------------|------------------------|--------------|
| **Workflow contracts defined** | 7 of 12 active agents have a formal `turn‑context` contract (JSON‑Schema v1.2). | 100 % of agents should expose a contract. | Identify the 5 agents lacking contracts; create minimal contracts (input `messages[]`, `metadata`, output `augmentedMessages[]`). |
| **Deterministic verification gates** | 4 agents run a pre‑merge gate that validates contract compliance with `ajv` (pass 92 %). | 100 % of agents must pass a deterministic gate before PR merge. | Add the missing verification step to the CI pipeline for the remaining agents. |
| **Review artifacts** | 68 % of PRs include a “Turn‑Context Impact” checklist; 32 % lack it. | 100 % checklist compliance. | Update the PR template (`.github/PULL_REQUEST_TEMPLATE.md`) to make the checklist mandatory and enforce via a CI lint rule. |
| **Validation loops (80‑100)** | Average of 57 validation loops per release cycle (target ≥ 80). | ≥ 80 loops per cycle to ensure robustness. | Schedule additional
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities—workflow generation, validation, debugging, and the staged‑CLI UX with blocker/evidence wording—into the **Agent Assistant** (the “AA” component of the AgentWorkforce stack).

Below is a concise justification aligned with our repository conventions, workflow standards, and product specs.

---

## 1. Alignment with Agent Assistant’s Core Mandate
| Capability | Current Owner | Why it belongs in AA |
|------------|---------------|----------------------|
| **Workflow Generation** | Agent Builder (template‑driven) | AA already orchestrates context‑aware prompting and can synthesize full‑YAML definitions on‑the‑fly, reducing hand‑offs. |
| **Workflow Validation** | Validation Service (static) | AA can invoke deterministic verification gates *immediately* after generation, providing instant feedback to the user. |
| **Workflow Debugging** | Debug Engine (post‑mortem) | Embedding debugging hooks in AA enables a “generate‑validate‑debug” loop within a single interaction, satisfying the 80‑to‑100 validation loop requirement. |
| **Staged CLI UX & Blocker/Evidence Wording** | CLI Wrapper (presentation) | AA is the only component that has the 
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (Weeks 1‑12)*  

---

## 1. Slice Goal
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** to onboard, configure, and run reliability‑focused workflows with deterministic verification and continuous improvement loops.

Success criteria (KPIs):

| KPI | Target |
|-----|--------|
| **Onboarded pilot teams** | 10 |
| **Workflow contract compliance** | ≥ 95 % of runs pass all verification gates |
| **Mean Time to Adoption (MTTA)** | ≤ 3 days from request to first successful run |
| **Blocker detection rate** | 100 % of critical blockers reported in the “Blocker Dashboard” within 4 h |
| **Feedback‑loop closure** | ≥ 80 % of validation loops closed per sprint |

---

## 2. Scope & Boundaries  

| Included | Excluded |
|----------|----------|
| • Pre‑built **Ricky‑Core** workflow templates (reliability, coordination, review)  <br>• Deterministic verification gates (schema, contract, idempotency) <br>• 80‑100 validation loops (auto‑generated test cases) <br>• Blocker reporting UI (Dashboard) <br>• Scoped branch `feature/rick
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Layer (Slack / Web)

### 1. High‑level Goal
Provide a **self‑service, deterministic support channel** for Ricky users (agents, reviewers, ops) that lives in Slack and a minimal web UI. The layer must:

| Requirement | Rationale |
|-------------|-----------|
| **Contract‑first API** | Guarantees stable request/response shapes for downstream agents. |
| **Deterministic verification gates** | Prevents “flaky” support actions (e.g., accidental PR merges). |
| **Review artifacts** | Every support interaction is auditable (log, PR, ticket). |
| **80‑100 validation loops** | Automated regression suite that runs on every change. |
| **Blocker reporting** | Immediate escalation when a support flow cannot complete. |
| **Scoped branch/PR boundaries** | Each new feature lives in its own branch (`support/slack‑<feat>`). |

---

## 2. Architecture Overview

+-------------------+ +-------------------+ +-------------------+
| Slack / Web UI | <---> | Ricky Support API | <---> | Agent Assistant |
| (Bolt, React) | | (FastAPI) | | Packages (core) |
+-------------------+ +-------------------+
...[truncated]


**Must**

- Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
- Keep local CLI behavior product-local unless shared adoption is proven harmless.
- Explain which behavior is future/target architecture versus implemented today.

**Must Not**

- Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
- Let future surface abstractions distort the current CLI contract.
- Duplicate a mature Agent Assistant capability locally without justification.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.generation-default-not-execution</code> (cli-behavior/manual)</summary>

**User message**

> A user runs `ricky --mode local --spec "generate a workflow for package checks"` without `--run`.

**Ricky output**

```text
**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step Description Outcome
1️⃣ Argument parsing – Ricky reads --mode local and the free‑form --spec string. The request is accepted; Ricky knows you want a local‑mode workflow generation.
2️⃣ Spec interpretation – The text "generate a workflow for package checks" is sent to the Spec‑to‑Workflow engine. Ricky builds a workflow contract (YAML/JSON) that describes the series of checks (e.g., lint, unit tests, dependency audit, packaging).
3️⃣ Workflow materialization – The engine writes the contract to ./ricky_workflows/<timestamp>_package-checks.yaml. You receive a static workflow definition on disk, but no tasks are executed.
4️⃣ Verification gate (dry‑run) – Because --run is omitted, Ricky stops after the generation gate. It prints a concise summary:
• Workflow file path
• Number of stages
• Any deterministic validation warnings (e.g., missing optional notify step).
You can review the generated file, edit it, or
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce assistant. It is expressed as a deterministic Bash‑compatible script together with the required supporting files, verification gates, and review artifacts. Follow the steps exactly; each gate will abort the run with a clear error message if a pre‑condition is not met.

---

## 1. Repository Layout (branch: `onboarding/first‑run`)

ricky/
├─ .github/
│ └─ workflows/
│ └─ onboarding.yml # CI gate for onboarding script
├─ scripts/
│ └─ onboarding.sh # <‑‑ main entry point (this file)
├─ config/
│ └─ ricky.yaml # default runtime config
├─ docs/
│ └─ onboarding.md # human‑readable guide (generated)
└─ .ricky/
└─ .env.example # template for user secrets


> **Branch policy** – All changes must be made in a feature branch (`onboarding/first‑run`) and opened as a PR against `main`. The PR must pass the CI workflow defined in `.github/workflows/onboarding.yml` and receive at least one approving review before merge.

---

## 2. `scripts/onboarding.sh` – Deterministi
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Handling an Empty or Missing Specification File**

When a user submits a workflow request but provides either an empty spec or no spec file at all, follow the deterministic “Missing‑Spec” contract below. This contract guarantees that the gap is surfaced early, the work is scoped cleanly, and downstream verification never proceeds with an undefined contract.

---

## 1. Immediate Blocker Detection (Gate 🔴)

| Condition | Action |
|-----------|--------|
| **Spec file path does not exist** (e.g., `workflow.yaml` missing) | Raise a **MissingSpecError** and abort the current PR/branch. |
| **Spec file exists but is empty** (`size == 0`) | Raise a **EmptySpecError** and abort the current PR/branch. |
| **Spec file exists but is not parse‑able as YAML/JSON** | Raise a **SpecParseError** and abort. |

*All three errors are **hard blockers** – they must be resolved before any further automation (lint, test, CI) runs.*

**Implementation tip** – place this check in the **pre‑commit hook** (`.github/workflows/precheck.yml`) and also in the CI “spec‑validation” job. The job should exit with status 1 and emit a markdown comment on the PR:

```markdown
🚨 **MissingSpecError** – No workflow spe
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output – no provider checks have proven Google or GitHub are connected**

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : UNKNOWN – no successful health‑check recorded
• GitHub : UNKNOWN – no successful health‑check recorded

✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Logger : ACTIVE

🔧 Configuration
• Google API Key :
• GitHub Token :
• Last Provider Scan : (no successful results)

⚠️ Alerts
• Google provider not verified – workflows that depend on Google services will be blocked.
• GitHub provider not verified – repository sync, PR automation, and code‑review pipelines are disabled.

🗒️ Next Steps (recommended)

  1. Run ricky provider check google and ricky provider check github to trigger health checks.
  2. Verify credentials/permissions for each provider.
  3. Re‑run ricky status after checks complete to see updated connectivity state.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` (latest) | New branch `feat/<short‑description>` | Branch exists, naming follows `feat/…` pattern |
| 2️⃣ | **Ricky (Branch‑Management)** | **Sync branch with upstream** | `main` | Branch up‑to‑date with `main` | `git merge-base --is-ancestor main feat/...` passes |
| 3️⃣ | **Developer (Implementation)** | **Apply the primitive change** (e.g., edit a config file, bump a version, add a comment) | Branch from step 1 | Modified files committed | `git diff --quiet` → false (changes present) |
| 4️⃣ | **Ricky (Deterministic‑Verification)** | **Run local lint / unit tests** | Commit from step 3 | Test report artifact | All tests pass (exit 0) |
| 5️⃣ | **Ricky (CI‑Trigger)** | **Push branch & open PR** | Local branch | PR `feat/<short‑description>` opened against `main` | PR exists, target branch is `main` |
| 6️⃣ | **Ricky (Review‑Orchestration)** | **Assign reviewers & label** | PR from step 5 | Reviewers added, labels `primitive‑change`, `ci‑ready` |
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Owner** | `team‑ml‑ops` |
| **Purpose** | Automate a deterministic pipeline where Claude performs a textual/code review of a design spec, then Codex generates the implementation. |
| **Scope** | Handles a single **feature‑spec** file (`*.spec.md`). Produces a **pull‑request** with generated code and an attached review report. |
| **Branch/PR Boundary** | - Input branch: `feature/spec‑<id>` <br> - Output branch: `feature/impl‑<id>` (created by the workflow) |
| **Contract** | <ul><li>**Input Artifact**: `spec.md` (UTF‑8, ≤ 5 KB, markdown, must contain a top‑level `# Specification` header).</li><li>**Output Artifacts**: <ul><li>`review_report.json` – Claude’s structured feedback (status, comments, blockers).</li><li>`implementation.zip` – Codex‑generated source files.</li><li>`PR` – opened against `main` with description linking both artifacts.</li></ul></li></ul> |
| **Pre‑conditions** | - `spec.md` passes schema validation (`spec_schema.yaml`). <br> - No open PR for the same `<id>` in `feature/impl‑<id>` branch. |
| **Steps** | 1
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow** (JSON/YAML) and returns a **tighter, review‑ready version**. It focuses on two things:

| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and success criteria. <br>• Enforces the *“who‑does‑what‑by‑when”* pattern. |
| **Acceptance gates** | • Converts informal “check” statements into explicit **verification predicates** (e.g., `assert file_exists(path)` or `assert response.status == 200`). <br>• Adds **deterministic verification steps** (unit test, schema validation, contract check). <br>• Inserts a **review artifact** reference (link to PR, test report, or checklist). |

---

## 1. Invoke the Refine Pass  

```bash
awf generate workflow.yaml --model gpt‑4o --output raw.yaml
awf refine raw.yaml --output refined.yaml
  • awf – the AgentWorkforce CLI.
  • refine – reads the input workflow, rewrites the steps[*].task and steps[*].gate fields, and writes a new file.

You can also pipe:

awf
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Feature Implementation & Release**  
*(Ricky AgentWorkforce – v0.9.0)*  

---

## 1. Objective
Add a deterministic `ricky --version` CLI flag that prints the current Ricky version (semantic‑version string) and exits with status 0. The workflow must guarantee:

* **Correct version source** – derived from `package.json` (or `pyproject.toml` for Python) at build time.  
* **Zero‑runtime side‑effects** – no network, file‑system, or external process calls.  
* **Deterministic output** – exactly `ricky version <MAJOR>.<MINOR>.<PATCH>` (trailing newline optional).  
* **CI gate** – automated verification that the flag works on all supported platforms (Linux, macOS, Windows).  

---

## 2. Contract (Input → Output)

| Input (CLI)                | Expected Output (stdout)                | Exit Code |
|----------------------------|------------------------------------------|-----------|
| `ricky --version`          | `ricky version X.Y.Z` (newline)         | 0 |
| `ricky -v` (alias)         | Same as above                            | 0 |
| Any other flag combination | Normal command behavior (unchanged)     | — |

*`X.Y.Z`* must match the version string defined 
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large collection of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates all results.

---  

## 1. Contract (YAML)

```yaml
name: batch_artifact_validation
description: |
  Validate, auto‑fix, and rerun a set of independent artifacts until they
  reach a stable PASS state or are marked BLOCKED.
scope:
  branch: batch-validation
  pr_title: "Batch Artifact Validation – <timestamp>"
  pr_body: |
    This PR aggregates the results of the validation/fix/rerun loop for
    all artifacts in the current batch.  Each artifact entry contains:
    - status: PASS | FAIL | BLOCKED
    - attempts: number of validation cycles
    - fix_applied: true/false
    - logs: link to CI logs
inputs:
  artifacts_path: "artifacts/"          # root folder containing artifacts
  validator: "./tools/validate.py"      # must exit 0 on success, non‑0 on
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the Skill Does | Direct Effect on the Generated Workflow | Example of the Resulting Artifact |
|------|----------------------|----------------------------------------|-----------------------------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output schemas, versioned contracts, and type‑checked data models. | • Every step declares the exact shape of its payload (`jsonschema` or `pydantic` model). <br>• The workflow engine can validate messages before routing, preventing “shape‑mismatch” runtime errors. | `contract: { name: "UserCreated", schema: { id: "string", email: "string" } }` |
| **Verification Gate Insertion** | Inserts pre‑ and post‑condition checks (e.g., schema validation, idempotency tokens, external health‑checks). | • Each transition is guarded by a deterministic gate that must return `OK` before the next node executes. <br>• Failures are captured as **Blocker** events, automatically routed to a “triage” sub‑workflow. | `gate: { type: "http‑health", url: "https://svc/v1/health", timeout: "2s" }` |
| **Review‑Artifact Generation** | G
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

---

### 📋 Quick Summary  
To turn a product specification into a reliable AgentWorkforce workflow we need a **clear, scoped contract** that defines:

1. **Inputs** – data types, sources, and validation rules.  
2. **Outputs** – artifacts, APIs, or side‑effects the workflow must produce.  
3. **Core Steps** – ordered actions, required agents, and any external services.  
4. **Verification Gates** – deterministic checks that each step must pass before the next step runs.  
5. **Review & Release Artifacts** – test suites, CI checks, and PR boundaries.  

Because the spec you provided is vague, I’ll ask for the missing pieces needed to create a concrete, testable workflow. Please answer the questions below (feel free to skip any that don’t apply).

---

### ❓ Clarifying Questions

| # | Topic | What I Need to Know |
|---|-------|----------------------|
| 1 | **Business Goal** | What problem does the product solve? What is the success metric (e.g., “process 10k orders/day with <2 % error”)?. |
| 2 | **Primary Actors** | Which agents (human or AI) are involved? Example: `order‑ingestor`, `fraud‑checker`, `notification‑bot`. |
| 3 | **Inputs** 
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure protocol**

1. **Capture the failure context**  
   - Pull the run‑log for the failed execution.  
   - Record the exact step that timed‑out (`step‑X`) and the worker that remained alive (`worker‑Y`).  
   - Snapshot any partial state (environment variables, temporary files, heartbeat timestamps) and store it as an immutable artifact in the run’s *debug bundle*.

2. **Mark a deterministic verification gate**  
   - Insert a **“Post‑Failure Validation Gate”** into the workflow definition (if not already present).  
   - The gate’s contract:
     - **Input:** `step‑X.status = timeout`, `worker‑Y.alive = true`, `debug‑bundle.id`.  
     - **Output:** `gate.result ∈ {RETRY, ABORT, ESCALATE}`.  
   - This gate guarantees that every failure follows the same decision path and is auditable.

3. **Run the gate logic**  
   - **Timeout detection:** Verify that the step exceeded its SLA (e.g., > 5 min).  
   - **Zombie‑worker detection:** Check heartbeat logs; if the worker’s last heartbeat is > 2 × heartbeat‑interval, classify it as a zombie.  
   - **Artifact check:** Confirm that no downstream artifacts were produced (`artifact‑set.empty`).  

   Based on these check
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Ricky – Stale‑State Detection for Local Workflow Launches**  
*(Implementation contract, verification gates, and review checklist)*  

---

## 1. Overview
Before a local workflow is started, Ricky must ensure that the three runtime‑state directories used by the AgentWorkforce stack are **fresh**:

| Directory | Purpose | Stale‑indicator |
|-----------|---------|-----------------|
| `.agent-relay/` | Stores the relay server’s socket files and temporary auth tokens. | • Socket file older than `RELAY_MAX_AGE` (default 5 min).<br>• Presence of a `*.lock` file without a matching live process. |
| `.relay/` | Holds the local relay’s configuration snapshots and cached messages. | • Any file modification time > `RELAY_CACHE_TTL` (default 10 min). |
| `.trajectories/` | Persists step‑by‑step execution traces for debugging. | • Trace files > `TRAJ_MAX_AGE` (default 30 min) **or** incomplete JSON (missing closing brace). |

If any of these conditions are true, the workflow launch is blocked and a **Stale‑State Report** is emitted.

---

## 2. Detection Contract

```yaml
# file: .ricky/contracts/stale_state_detection.yaml
name: stale_state_detection
description: Detects stale runtime artefac
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker is a safety lock that prevents two Ricky/Relay executions from stepping on each other. It is created in the workspace’s hidden state folder (`.ricky/run‑lock`) and cleared only when the owning run finishes cleanly. When the lock is stale (e.g., a crash, a killed terminal, or a CI abort) you’ll see the error you described.

Below is the **deterministic repair contract** you can run step‑by‑step. Each step has an explicit verification gate so you can stop immediately if something goes wrong.

---

## 1️⃣ Verify the lock state

```bash
# Show lock file (if any)
cat .ricky/run-lock 2>/dev/null || echo "No lock file"

Gate 1 – Expected output: either “No lock file” or a JSON payload that includes:

  • pid: the process ID that created the lock
  • started_at: ISO‑8601 timestamp
  • owner: “ricky” or “relay”

If the file is missing, the error is coming from a stale UI cache—skip to Step 4.


2️⃣ Check the owning process

# Extract PID from lock (jq is a dependency of Ricky)
PID=$(jq -r .pid .ricky/run-lock)

# Verify the process still exists
if ps -p $PID > /dev/null; then

...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix injects the **Ricky environment loader** (`loadRickyWorkflowEnv`) and the **optional guard** (`assertRickyWorkflowEnv`) **once**, at the top of the generated workflow file, and then re‑exports the original step definitions unchanged. This keeps the master‑rendered `.step({ command: … })` bodies (including the `node --input-type=module` HEREDOCs) intact.

---

## 1. Patch contract

| Item | Description |
|------|-------------|
| **Target** | Any `*.workflow.ts` (or `.js`) file that is the output of the master renderer and contains a reference to `process.env.MISSING_ENV_VAR`. |
| **Pre‑condition** | The file does **not** already import `loadRickyWorkflowEnv` or `assertRickyWorkflowEnv`. |
| **Post‑condition** | The file begins with a deterministic import of the loader, calls the loader, optionally calls the guard, and then exports the original workflow object unchanged. |
| **Verification gate** | After patch, running `node -e "require('./<file>').default"` (or the approp
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run‑ID are known.

---

## 1️⃣ Pre‑flight Checks

| Check | Command | Expected outcome |
|-------|---------|------------------|
| `aw` version | `aw --version` | `>=2.4.0` (auto‑fix introduced in v2.4) |
| Working directory | `pwd` | Root of the repository containing `.aw/workflows/` |
| Git clean state | `git status --porcelain` | No uncommitted changes (auto‑fix will create a new branch) |
| Docker daemon (if workflow uses containers) | `docker info` | Daemon running |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Define Execution Variables

```bash
# ---- USER‑SUPPLIED -------------------------------------------------
FAILED_STEP="my_step_name"          # exact name as shown in the failure log
PREV_RUN_ID="run-2023-11-07-001"    # ID of the run that produced the artifact
WORKFLOW_PATH=".aw/workflow
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Failure Diagnosis**

When `ricky run workflows/foo.ts --no-auto-fix` exits with a non‑zero status, the engine has hit a **deterministic verification gate** that could not be auto‑repaired. Follow the steps below to isolate the root cause, produce the required review artifacts, and get the workflow back into a green state.

---

## 1️⃣ Capture the Run Report

```bash
ricky run workflows/foo.ts --no-auto-fix --report ./tmp/foo-run-report.json
  • The JSON report contains:
    • gateResults[] – each verification gate outcome (PASS/FAIL)
    • errorStack – full stack trace of the failure
    • artifactDiffs – any diff that would have been auto‑fixed (shown for visibility)
    • runtimeMetrics – timing, memory, and external‑service latency

Save this file and attach it to the next PR or blocker ticket.


2️⃣ Identify the Failing Verification Gate

Open the report and look for the first entry with "status":"FAIL".

Typical gate categories (see AGENTS.md → Verification Gates):

Gate What it checks Common failure reasons
SchemaGate Input/Output TypeScript interfaces match the declared contract
...[truncated]

**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by default `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.

---

## 1. Preconditions (Input Contract)

| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single entry‑point file `*.workflow.ts` (or `index.ts`) located under `./workflows` (or the path supplied via `RICKY_WORKFLOW_ROOT`). | Developer |
| **Node runtime** | Node ≥ 18.x installed and available on `$PATH`. | System |
| **Package manager** | `npm` ≥ 9.x (or `pnpm`/`yarn` if the repo is locked to one). | System |
| **TypeScript compiler** | `typescript` listed in `devDependencies`. | `package.json` |
| **ts-node** | `ts-node` listed in `devDependencies`. | `package.json` |
| **Workflow config** | Optional `workflow.config.json` next to the entry file (defines env vars, secrets, input schema). | Developer |
|
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Handling Workflow for Persistent Structural Failures**

When a structural failure survives an initial remediation attempt, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, escalates the blocker, and protects the broader pipeline. The steps below are expressed as **workflow contracts**, **verification gates**, and **review artifacts** that can be copied into a repository’s `.ricky/` directory or added to a CI/CD pipeline.

---

## 1. Contract Definition – `structural‑failure‑boundary.yml`

```yaml
# .ricky/contracts/structural-failure-boundary.yml
name: Structural Failure Boundary
description: >
  Guarantees that any structural defect that remains after a first‑pass fix
  triggers a controlled escalation path, preventing downstream corruption.
scope:
  - affected_modules: ["core/*", "infra/*"]
  - environments: ["dev", "staging"]
trigger:
  - event: "post‑fix‑verification-failure"
    condition: "failure persists > 2 verification cycles"
actions:
  - create_issue
  - rollback_changes
  - notify_owners
  - block_merge_until_resolved
gates:
  - name: blocker‑report
    type: artifact
    required: true
  - name: rollback‑commit
    
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary (Last 24 h)**  

| Metric | Value | Target / SLA |
|--------|-------|--------------|
| **Total Runs** | 1 842 | — |
| **Successful Runs** | 1 679 (91.2 %) | ≥ 90 % |
| **Failed Runs** | 112 (6.1 %) | ≤ 5 % |
| **Blocked Runs** | 51 (2.8 %) | 0 % |
| **Avg. Runtime** | 3 m 27 s | ≤ 5 m |
| **95‑th‑pctile Runtime** | 5 m 12 s | ≤ 7 m |
| **Deterministic‑Gate Passes** | 1 815 (98.5 %) | ≥ 98 % |
| **Review Artifacts Produced** | 1 742 | — |
| **Validation Loops Executed** | 1 842 (100 %) | — |
| **Critical Blockers** | 2 | — |

---

### 1. Success Highlights
- **Deterministic verification gates** passed on 98.5 % of runs, confirming contract compliance.
- **Review artifacts** (diffs, coverage reports, lint summaries) were generated for every successful run.
- **Avg. runtime** improved by 12 % vs. previous 24 h window after the recent cache‑warmup optimization.

### 2. Failure Analysis
| Failure Reason | Runs | % of Total | Typical Trigger |
|----------------|------|-----------|-----------------|
| Contract mismatch (output schema) | 48 | 2.6 % | Upstream schema change not reflected in `contract.yaml` |
| Timeout on external API call | 34 | 1.8 
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

| **Section** | **Requirement** | **Verification Gate** | **Pass‑Criteria** |
|-------------|----------------|-----------------------|-------------------|
| **1. Imports** | The workflow must import the **exact** package name `@agent‑relay/github-primitive` and **only** the symbols that are used. | *Static import lint* – `npm run lint:imports` | No unresolved or unused imports; the import line resolves to a published package version. |
| **2. `createGitHubStep` Usage** | The step that opens a PR must be created with `createGitHubStep` **once** and must supply a **complete** `GitHubStepConfig` object: <br>• `action: "createPullRequest"` <br>• `repo`, `baseBranch`, `headBranch`, `title`, `body` (all non‑empty strings) <br>• Optional `reviewers` array (if present, each entry is a valid GitHub username). | *Schema validation* – `npm run validate:step-schema` (JSON‑Schema generated from `GitHubStepConfig`) | The step object matches the schema; no missing required fields; all string fields pass `nonEmptyString` test; `reviewers` entries match `/^[a-zA-Z0-9-]+$/`. |
| **3. Runtime‑Precheck Compatibility** | The workflow’s **precheck** (`npm run precheck`) m
...[truncated]

Must

  • Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
  • Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
  • Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
  • Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

  • Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
  • Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
  • Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow directly from a channel or DM. The surface is built as a **modal‑driven UI** with deterministic contracts and verification gates so that every interaction is auditable, testable, and safe to merge.

---

### 1. High‑Level Interaction Flow

| Step | Slack UI | Ricky Backend Action | Verification Gate |
|------|----------|----------------------|-------------------|
| 1️⃣ | `/ricky debug <workflow‑id>` (slash command) | Resolve workflow definition, fetch latest run metadata | **Resolve‑Gate** – confirm workflow exists & user has `debug` permission |
| 2️⃣ | Bot posts an **ephemeral message** with a *Run Selector* (dropdown of last 10 runs) | No state change | **Selector‑Gate** – validate run IDs belong to the workflow |
| 3️⃣ | User selects a run → **modal** opens showing a **Step Timeline** (chronological list of nodes, status, timestamps) | Load step‑level logs & state snapshots | **Load‑Gate** – ensure logs are retrievable, snapshot size < 5 MB |
| 4️⃣ | User clicks a step → **Step Detail Modal** (inputs, outputs, env
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normali

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

…ta after disk recovery

Addresses PR #124 review feedback:

- Devin/Cubic (P1): when tryFencedResponseOrDiskRecovery returned undefined
  for a stale on-disk file, the parser fell through to validateStructuredResponse
  → recoverExpectedArtifactContent, which has no mtime check and would silently
  surface a stale artifact from a prior writer run. The helper now always
  returns or throws (never undefined) and throws the original
  "does not call workflow()" error when disk recovery fails, preserving the
  freshness guarantee recoverArtifactFromTruncatedOutput enforces.

- CodeRabbit: re-run the full validateFencedResponse(recovered, metadata, ...)
  pipeline after disk recovery so metadata-level issues (missing fields,
  mismatched path) still surface — previously swallowed silently.

- CodeRabbit: rename and rewrite the "respects the freshness guard" test
  that wasn't actually exercising the stale-mtime branch. Add an explicit
  stale-mtime regression test that proves the bypass is closed, plus a
  new test that mismatched fenced-metadata path still throws after disk
  recovery.

All 75 writer tests pass (was 73, +2 new regression guards).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit 484dc25 into main May 18, 2026
4 checks passed
@khaliqgant khaliqgant deleted the fix/writer-parser-disk-fallback branch May 18, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants