diff --git a/codev-skeleton/protocols/air/builder-prompt.md b/codev-skeleton/protocols/air/builder-prompt.md index 8a44aede1..963d5e5fe 100644 --- a/codev-skeleton/protocols/air/builder-prompt.md +++ b/codev-skeleton/protocols/air/builder-prompt.md @@ -26,6 +26,12 @@ You are running in STRICT mode. This means: ## Protocol Follow the AIR protocol: `codev/protocols/air/protocol.md` +## Baked Decisions + +If the issue body contains a section named "Baked Decisions" (any heading level, case-insensitive), treat its contents as fixed architectural decisions baked in by the architect. Do not autonomously override them in your spec, plan, or implementation. If you discover a serious reason to question a baked decision, surface that concern to the architect via `afx send` rather than relitigating it inside the spec/plan/review. + +If the architect's baked-decisions section contains internal contradictions (e.g., two different language choices), do not pick one — pause, flag the contradiction to the architect via `afx send`, and wait for resolution before proceeding. + {{#if issue}} ## Issue #{{issue.number}} **Title**: {{issue.title}} diff --git a/codev-skeleton/protocols/air/consult-types/impl-review.md b/codev-skeleton/protocols/air/consult-types/impl-review.md index aacdefbdb..b382faedc 100644 --- a/codev-skeleton/protocols/air/consult-types/impl-review.md +++ b/codev-skeleton/protocols/air/consult-types/impl-review.md @@ -10,6 +10,12 @@ Before requesting changes for missing configuration, incorrect patterns, or fram 2. **Read the actual config files** (or confirm their deliberate absence) before flagging missing configs 3. **Do not assume** your training data reflects the version in use — verify against project files +## Baked Decisions + +If the issue body includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns about a baked decision (the architect decides whether to rescind it); reserve `REQUEST_CHANGES` for the case where the implementation **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions, do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Issue Adherence** diff --git a/codev-skeleton/protocols/air/consult-types/pr-review.md b/codev-skeleton/protocols/air/consult-types/pr-review.md index 915903934..8c0f6c552 100644 --- a/codev-skeleton/protocols/air/consult-types/pr-review.md +++ b/codev-skeleton/protocols/air/consult-types/pr-review.md @@ -3,6 +3,12 @@ ## Context You are performing a review of a pull request created under the AIR protocol. The builder implemented a small feature directly from a GitHub issue — there are no spec, plan, or review files. The review is embedded in the PR body. +## Baked Decisions + +If the issue body includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns about a baked decision (the architect decides whether to rescind it); reserve `REQUEST_CHANGES` for the case where the code **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions, do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Completeness** diff --git a/codev-skeleton/protocols/air/prompts/implement.md b/codev-skeleton/protocols/air/prompts/implement.md index 52767f21a..301641962 100644 --- a/codev-skeleton/protocols/air/prompts/implement.md +++ b/codev-skeleton/protocols/air/prompts/implement.md @@ -6,6 +6,12 @@ You are executing the **IMPLEMENT** phase of the AIR protocol. Read the GitHub issue, implement the feature, and add tests. Keep it focused and under 300 LOC. +## Baked Decisions + +Check the issue body for a section named "Baked Decisions" (any heading level, case-insensitive). If present, treat each listed decision as fixed during implementation. Do not autonomously substitute alternate languages, frameworks, or dependencies. If you discover a serious problem with a baked decision, raise it via `afx send architect` rather than working around it. + +If two baked decisions contradict each other, do not pick one — pause, flag the contradiction via `afx send`, and wait for resolution before implementing. + ## Context - **Issue**: #{{issue.number}} — {{issue.title}} diff --git a/codev-skeleton/protocols/air/protocol.md b/codev-skeleton/protocols/air/protocol.md index 7c78bf7e3..74609fd29 100644 --- a/codev-skeleton/protocols/air/protocol.md +++ b/codev-skeleton/protocols/air/protocol.md @@ -37,6 +37,10 @@ AIR is a minimal protocol for implementing small features (< 300 LOC) where the - Architectural changes → use **SPIR** - Complex features with multiple phases → use **SPIR** or **ASPIR** +## Baked Decisions (Optional) + +When filing an issue for AIR, you can pin architectural decisions you don't want the builder or CMAP reviewers to re-litigate. Include a `## Baked Decisions` section (any heading level is fine) anywhere in the issue body. Useful categories: language, framework, deployment shape, key dependencies, decisions deferred to a later spec. The builder will treat each listed item as fixed during implementation; CMAP reviewers will not propose alternatives unless the implementation itself fails to honor a stated decision. Leave the section out for issues where you want the builder to explore freely — absence is the no-op default. You can amend or rescind a baked decision at any time by updating the issue and respawning, or by sending the builder a direct instruction via `afx send`. + ## Protocol Phases ### I - Implement diff --git a/codev-skeleton/protocols/aspir/builder-prompt.md b/codev-skeleton/protocols/aspir/builder-prompt.md index 48ec8a425..dd22b316b 100644 --- a/codev-skeleton/protocols/aspir/builder-prompt.md +++ b/codev-skeleton/protocols/aspir/builder-prompt.md @@ -30,6 +30,12 @@ You are running in STRICT mode. This means: Follow the ASPIR protocol: `codev/protocols/aspir/protocol.md` Read and internalize the protocol before starting any work. +## Baked Decisions + +If the issue body contains a section named "Baked Decisions" (any heading level, case-insensitive), treat its contents as fixed architectural decisions baked in by the architect. Do not autonomously override them in your spec, plan, or implementation. If you discover a serious reason to question a baked decision, surface that concern to the architect via `afx send` rather than relitigating it inside the spec/plan/review. + +If the architect's baked-decisions section contains internal contradictions (e.g., two different language choices), do not pick one — pause, flag the contradiction to the architect via `afx send`, and wait for resolution before proceeding. + {{#if spec}} ## Spec Read the specification at: `{{spec.path}}` diff --git a/codev-skeleton/protocols/aspir/consult-types/plan-review.md b/codev-skeleton/protocols/aspir/consult-types/plan-review.md index 585085dec..485ff3183 100644 --- a/codev-skeleton/protocols/aspir/consult-types/plan-review.md +++ b/codev-skeleton/protocols/aspir/consult-types/plan-review.md @@ -3,6 +3,12 @@ ## Context You are reviewing an implementation plan during the Plan phase. The spec has been approved - now you must evaluate whether the plan adequately describes HOW to implement it. +## Baked Decisions + +If the issue body or the approved spec's Constraints section includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed (this extends the existing "don't re-litigate spec decisions" rule with explicit baked-decision language). Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns; reserve `REQUEST_CHANGES` for the case where the plan **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions, do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Spec Coverage** diff --git a/codev-skeleton/protocols/aspir/consult-types/spec-review.md b/codev-skeleton/protocols/aspir/consult-types/spec-review.md index 7c9c1579b..b537e7cc6 100644 --- a/codev-skeleton/protocols/aspir/consult-types/spec-review.md +++ b/codev-skeleton/protocols/aspir/consult-types/spec-review.md @@ -3,6 +3,12 @@ ## Context You are reviewing a feature specification during the Specify phase. Your role is to ensure the spec is complete, correct, and feasible before it moves to human approval. +## Baked Decisions + +If the issue body or the spec's Constraints section includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns about a baked decision (the architect decides whether to rescind it); reserve `REQUEST_CHANGES` for the case where the spec **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions (e.g., two different language choices), do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Completeness** diff --git a/codev-skeleton/protocols/aspir/prompts/specify.md b/codev-skeleton/protocols/aspir/prompts/specify.md index 6da1868f8..3716f7969 100644 --- a/codev-skeleton/protocols/aspir/prompts/specify.md +++ b/codev-skeleton/protocols/aspir/prompts/specify.md @@ -31,6 +31,12 @@ ls codev/specs/{{project_id}}-*.md **If no spec exists:** Proceed to Step 1 below. +### 0.5 Baked Decisions + +Before exploring solution approaches, check the issue body for a section named "Baked Decisions" (any heading level, case-insensitive). If present, copy its content verbatim into the spec's Constraints section and treat each item as fixed. Do not autonomously relitigate the architect's choices in your Solution Exploration. If you discover a serious problem with a baked decision, raise it via `afx send architect` rather than overriding it in the spec. + +If two baked decisions contradict each other (e.g., two different language choices), do not pick one — pause, flag the contradiction via `afx send`, and wait for resolution before drafting. + ### 1. Clarifying Questions (ONLY IF NO SPEC EXISTS) Before writing anything, ask clarifying questions to understand: diff --git a/codev-skeleton/protocols/aspir/protocol.md b/codev-skeleton/protocols/aspir/protocol.md index b79391f32..18a423a30 100644 --- a/codev-skeleton/protocols/aspir/protocol.md +++ b/codev-skeleton/protocols/aspir/protocol.md @@ -41,6 +41,10 @@ Use SPIR instead when: - The work is **high-risk** — security-sensitive, user-facing, or broadly impactful - You want to **review and adjust** the plan before implementation starts +## Baked Decisions (Optional) + +When filing an issue for ASPIR, you can pin architectural decisions you don't want the builder or CMAP reviewers to re-litigate. Include a `## Baked Decisions` section (any heading level is fine) anywhere in the issue body. Useful categories: language, framework, deployment shape, key dependencies, decisions deferred to a later spec. The builder will copy the section verbatim into the spec's Constraints and treat each item as fixed; CMAP reviewers will not propose alternatives unless the spec itself fails to honor a stated decision. Leave the section out for issues where you want the builder to explore freely — absence is the no-op default. You can amend or rescind a baked decision at any time by updating the issue and respawning, or by sending the builder a direct instruction via `afx send`. + ## Protocol Phases ASPIR follows the same four phases as SPIR. For full phase documentation, see the [SPIR protocol](../spir/protocol.md). diff --git a/codev-skeleton/protocols/spir/builder-prompt.md b/codev-skeleton/protocols/spir/builder-prompt.md index 1abf4b289..28efde4af 100644 --- a/codev-skeleton/protocols/spir/builder-prompt.md +++ b/codev-skeleton/protocols/spir/builder-prompt.md @@ -30,6 +30,12 @@ You are running in STRICT mode. This means: Follow the SPIR protocol: `codev/protocols/spir/protocol.md` Read and internalize the protocol before starting any work. +## Baked Decisions + +If the issue body contains a section named "Baked Decisions" (any heading level, case-insensitive), treat its contents as fixed architectural decisions baked in by the architect. Do not autonomously override them in your spec, plan, or implementation. If you discover a serious reason to question a baked decision, surface that concern to the architect via `afx send` rather than relitigating it inside the spec/plan/review. + +If the architect's baked-decisions section contains internal contradictions (e.g., two different language choices), do not pick one — pause, flag the contradiction to the architect via `afx send`, and wait for resolution before proceeding. + {{#if spec}} ## Spec Read the specification at: `{{spec.path}}` diff --git a/codev-skeleton/protocols/spir/consult-types/plan-review.md b/codev-skeleton/protocols/spir/consult-types/plan-review.md index 585085dec..485ff3183 100644 --- a/codev-skeleton/protocols/spir/consult-types/plan-review.md +++ b/codev-skeleton/protocols/spir/consult-types/plan-review.md @@ -3,6 +3,12 @@ ## Context You are reviewing an implementation plan during the Plan phase. The spec has been approved - now you must evaluate whether the plan adequately describes HOW to implement it. +## Baked Decisions + +If the issue body or the approved spec's Constraints section includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed (this extends the existing "don't re-litigate spec decisions" rule with explicit baked-decision language). Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns; reserve `REQUEST_CHANGES` for the case where the plan **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions, do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Spec Coverage** diff --git a/codev-skeleton/protocols/spir/consult-types/spec-review.md b/codev-skeleton/protocols/spir/consult-types/spec-review.md index 7c9c1579b..b537e7cc6 100644 --- a/codev-skeleton/protocols/spir/consult-types/spec-review.md +++ b/codev-skeleton/protocols/spir/consult-types/spec-review.md @@ -3,6 +3,12 @@ ## Context You are reviewing a feature specification during the Specify phase. Your role is to ensure the spec is complete, correct, and feasible before it moves to human approval. +## Baked Decisions + +If the issue body or the spec's Constraints section includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns about a baked decision (the architect decides whether to rescind it); reserve `REQUEST_CHANGES` for the case where the spec **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions (e.g., two different language choices), do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Completeness** diff --git a/codev-skeleton/protocols/spir/prompts/specify.md b/codev-skeleton/protocols/spir/prompts/specify.md index 6da1868f8..3716f7969 100644 --- a/codev-skeleton/protocols/spir/prompts/specify.md +++ b/codev-skeleton/protocols/spir/prompts/specify.md @@ -31,6 +31,12 @@ ls codev/specs/{{project_id}}-*.md **If no spec exists:** Proceed to Step 1 below. +### 0.5 Baked Decisions + +Before exploring solution approaches, check the issue body for a section named "Baked Decisions" (any heading level, case-insensitive). If present, copy its content verbatim into the spec's Constraints section and treat each item as fixed. Do not autonomously relitigate the architect's choices in your Solution Exploration. If you discover a serious problem with a baked decision, raise it via `afx send architect` rather than overriding it in the spec. + +If two baked decisions contradict each other (e.g., two different language choices), do not pick one — pause, flag the contradiction via `afx send`, and wait for resolution before drafting. + ### 1. Clarifying Questions (ONLY IF NO SPEC EXISTS) Before writing anything, ask clarifying questions to understand: diff --git a/codev-skeleton/protocols/spir/protocol.md b/codev-skeleton/protocols/spir/protocol.md index 475316c13..188c2bda1 100644 --- a/codev-skeleton/protocols/spir/protocol.md +++ b/codev-skeleton/protocols/spir/protocol.md @@ -79,6 +79,10 @@ Each phase follows a build-verify loop: build the artifact, then verify with 3-w - Dependency updates - Emergency hotfixes (but do a lightweight retrospective after) +## Baked Decisions (Optional) + +When filing an issue for SPIR, you can pin architectural decisions you don't want the builder or CMAP reviewers to re-litigate. Include a `## Baked Decisions` section (any heading level is fine) anywhere in the issue body. Useful categories: language, framework, deployment shape, key dependencies, decisions deferred to a later spec. The builder will copy the section verbatim into the spec's Constraints and treat each item as fixed; CMAP reviewers will not propose alternatives unless the spec itself fails to honor a stated decision. Leave the section out for issues where you want the builder to explore freely — absence is the no-op default. You can amend or rescind a baked decision at any time by updating the issue and respawning, or by sending the builder a direct instruction via `afx send`. + ## Protocol Phases ### S - Specify (Collaborative Design Exploration) diff --git a/codev/plans/746-spir-architect-s-baked-archite.md b/codev/plans/746-spir-architect-s-baked-archite.md new file mode 100644 index 000000000..057a5e4c2 --- /dev/null +++ b/codev/plans/746-spir-architect-s-baked-archite.md @@ -0,0 +1,389 @@ +# Plan: Baked Architectural Decisions in SPIR Issue Body + +--- +approved: 2026-05-17 +validated: [gemini, codex, claude] +--- + +## Metadata +- **ID**: plan-2026-05-17-baked-decisions +- **Status**: approved +- **Specification**: [codev/specs/746-spir-architect-s-baked-archite.md](../specs/746-spir-architect-s-baked-archite.md) +- **Created**: 2026-05-17 + +## Executive Summary + +**Pure prompt-and-documentation change.** No code surface touched. No parser, no `TemplateContext` field, no template-engine changes. The LLM finds the `## Baked Decisions` section in the issue body (which is already passed verbatim via `{{issue.body}}`) and honors it because the prompt tells it to. + +This is the simplest design that satisfies the spec. It also handles variant section names (e.g., "Constraints (fixed)", "Architectural Choices") and inline baked decisions in prose more gracefully than a regex parser would — the LLM recognizes intent, the regex would not. Adding parser infrastructure for what is fundamentally a prompt-discipline question would be against Codev's core ethos. + +**4 phases.** Each phase is independently committable, valuable, and testable. The tests are exclusively grep-based content assertions plus pure-addition diffs against pre-change baselines — no template-rendering snapshots, no parser unit tests. + +## Success Metrics + +Copied from the spec's Success Criteria. Cross-reference: spec section "Success Criteria" lists 14 deterministic pass/fail checks. The phase-level Acceptance Criteria below say which spec criteria each phase closes. + +- [ ] All specification criteria met +- [ ] Test coverage: every touched prompt file has a grep regression test asserting the required instruction language; every touched file has a pure-addition diff test against its pre-change baseline +- [ ] No regression: every touched static markdown file's diff vs. its pre-change baseline contains zero removed lines and zero modified lines (only additions) +- [ ] Documentation discoverability: `grep -l "Baked Decisions" codev/protocols/*/protocol.md` returns three files (SPIR / ASPIR / AIR) +- [ ] Skeleton parity: `diff -r codev/protocols/ codev-skeleton/protocols/` shows no substantive differences for touched files + +## Phases (Machine Readable) + +```json +{ + "phases": [ + {"id": "phase_1", "title": "Builder-prompt instruction (SPIR/ASPIR/AIR + skeleton)"}, + {"id": "phase_2", "title": "Drafting prompts: specify.md (SPIR/ASPIR) + implement.md (AIR) + skeleton"}, + {"id": "phase_3", "title": "Reviewer prompts: spec-review / plan-review / impl-review / pr-review + skeleton"}, + {"id": "phase_4", "title": "Protocol documentation + final regression sweep"} + ] +} +``` + +## Phase Breakdown + +### Phase 1: Builder-Prompt Instruction +**Dependencies**: None + +#### Objectives +- Add a uniform instruction paragraph to the SPIR / ASPIR / AIR `builder-prompt.md` templates (and their `codev-skeleton/` mirrors) telling the builder to recognize a `## Baked Decisions` section in the issue body and treat its contents as fixed. +- Capture pre-change baselines of all touched files (used in this phase and later phases for the no-regression assertion). + +#### Deliverables +- [ ] Pre-change baseline snapshots of the 12 prompt files touched across this and subsequent phases (3 builder-prompts + 3 drafting prompts + 6 reviewer prompts), committed under `packages/codev/src/agent-farm/__tests__/fixtures/baselines/`. Captured up-front in Phase 1 so subsequent phases can assert against them. +- [ ] Edits to `codev/protocols/spir/builder-prompt.md`, `codev/protocols/aspir/builder-prompt.md`, `codev/protocols/air/builder-prompt.md` +- [ ] Identical edits mirrored to `codev-skeleton/protocols/{spir,aspir,air}/builder-prompt.md` +- [ ] Grep regression test in a new `packages/codev/src/agent-farm/__tests__/baked-decisions.test.ts` (or extension of an existing test file) asserting each builder-prompt contains the required strings +- [ ] Pure-addition diff test against the pre-change baselines + +#### Implementation Details + +**Instruction paragraph** (uniform across all three builder-prompts, final wording TBD during implementation): + +```markdown +## Baked Decisions + +If the issue body contains a section named "Baked Decisions" (any heading level, case-insensitive), treat its contents as fixed architectural decisions baked in by the architect. Do not autonomously override them in your spec, plan, or implementation. If you discover a serious reason to question a baked decision, surface that concern to the architect via `afx send` rather than relitigating it inside the spec/plan/review. + +If the architect's baked-decisions section contains internal contradictions (e.g., two different language choices), do not pick one — pause, flag the contradiction to the architect via `afx send`, and wait for resolution before proceeding. +``` + +**Placement**: Insert near the top of each builder-prompt, after the `## Protocol` section and before `{{#if spec}}` / `{{#if issue}}` blocks. This ensures the builder reads the instruction before encountering the issue body. + +**Files touched**: +- `codev/protocols/spir/builder-prompt.md` +- `codev/protocols/aspir/builder-prompt.md` +- `codev/protocols/air/builder-prompt.md` +- `codev-skeleton/protocols/spir/builder-prompt.md` +- `codev-skeleton/protocols/aspir/builder-prompt.md` +- `codev-skeleton/protocols/air/builder-prompt.md` +- `packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-builder-prompt.md.baseline` +- `packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-builder-prompt.md.baseline` +- `packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-builder-prompt.md.baseline` +- `packages/codev/src/agent-farm/__tests__/fixtures/baselines/` — also pre-populate baselines for the 9 other prompt files touched in Phases 2-3 (capture once, use across all phases) +- `packages/codev/src/agent-farm/__tests__/baked-decisions.test.ts` (new test file) + +#### Acceptance Criteria +Closes spec criteria: *SPIR/ASPIR/AIR builder-prompt surface baked decisions*. +- [ ] Each of the 3 codev + 3 skeleton builder-prompts contains the literal string `Baked Decisions` +- [ ] Each contains the carveout phrase (`do not autonomously`) +- [ ] Each contains contradiction-handling vocabulary (`contradict` AND `pause`) +- [ ] Diff of each post-edit file vs. its pre-change baseline is **pure addition** — zero removed lines, zero modified lines +- [ ] `diff -r codev/protocols/{spir,aspir,air}/builder-prompt.md codev-skeleton/protocols/{spir,aspir,air}/builder-prompt.md` shows no differences + +#### Test Plan +- **Baseline capture script**: a small helper (can be a one-line shell loop or part of the test file setup) that reads each of the 12 currently-touched files and writes them to `__tests__/fixtures/baselines/`. Run **once** at the start of Phase 1, before any edits. +- **Grep regression test** (vitest): reads each builder-prompt file and asserts the literal strings above. +- **Pure-addition diff test** (vitest): reads each builder-prompt and its baseline, computes a line-diff (using the `diff` npm package — already commonly available — or a 30-line hand-rolled function), asserts `diff.removed.length === 0` and `diff.modified.length === 0`. + +#### Rollback Strategy +Per-file paragraph revert. No code surface touched, so no rollback complexity. + +#### Risks +- **Risk**: Paragraph wording drifts between SPIR / ASPIR / AIR because they're edited independently. + - **Mitigation**: Author a single canonical paragraph; copy verbatim to all three. The grep test enforces keyword consistency. +- **Risk**: Baseline capture happens after an unintended pre-Phase-1 edit, polluting the baseline. + - **Mitigation**: First commit of Phase 1 is **only** the baseline capture (a separate "[Spec 746][Phase: 1] chore: capture pre-change baselines" commit) before any prompt edits. + +--- + +### Phase 2: Drafting Prompts (specify.md + implement.md) +**Dependencies**: Phase 1 (so the baseline-capture infrastructure is in place; the baselines for Phase 2's files were captured in Phase 1) + +#### Objectives +- Update SPIR / ASPIR `prompts/specify.md` (+ skeleton mirrors) so the builder, when drafting a spec, reads the baked-decisions section first and writes its content verbatim into the spec's Constraints section. +- Update AIR `prompts/implement.md` (+ skeleton mirror) with an analogous "honor baked decisions from the issue body" clause — AIR skips the spec phase so its baked-decision discipline lives in the implement prompt. +- All prompt language uses the architect-override carveout framing (spec Resolved Decision #12). + +#### Deliverables +- [ ] Edit `codev/protocols/spir/prompts/specify.md`: add clause instructing the builder to look for the baked-decisions section first and copy it into Constraints verbatim +- [ ] Edit `codev/protocols/aspir/prompts/specify.md`: same edit +- [ ] Edit `codev/protocols/air/prompts/implement.md`: analogous clause near the implementation instructions +- [ ] Mirror all three to `codev-skeleton/protocols/{spir,aspir,air}/prompts/` +- [ ] Extend the grep regression test from Phase 1 to cover these 6 files +- [ ] Pure-addition diff test against pre-change baselines for these 6 files + +#### Implementation Details + +**Clause text for SPIR/ASPIR `specify.md`** (final wording TBD during implementation): + +> **Baked Decisions.** Before exploring solution approaches, check the issue body for a section named "Baked Decisions" (any heading level, case-insensitive). If present, copy its content verbatim into the spec's Constraints section and treat each item as fixed. Do not autonomously relitigate the architect's choices in your Solution Exploration. If you discover a serious problem with a baked decision, raise it via `afx send architect` rather than overriding it in the spec. If two baked decisions contradict each other (e.g., two different language choices), do not pick one — pause, flag the contradiction via `afx send`, and wait for resolution before drafting. + +**Clause text for AIR `implement.md`**: + +> **Baked Decisions.** Check the issue body for a section named "Baked Decisions" (any heading level, case-insensitive). If present, treat each listed decision as fixed during implementation. Do not autonomously substitute alternate languages, frameworks, or dependencies. If you discover a serious problem with a baked decision, raise it via `afx send architect` rather than working around it. If two baked decisions contradict each other, do not pick one — pause, flag the contradiction via `afx send`, and wait for resolution before implementing. + +**Placement**: Near the top of the operative section (in specify.md, right after the "Check for Existing Spec" block; in implement.md, right after the "Goal" block) so the builder reads the rule before starting drafting/implementation. + +**Files touched**: +- `codev/protocols/spir/prompts/specify.md` +- `codev/protocols/aspir/prompts/specify.md` +- `codev/protocols/air/prompts/implement.md` +- Skeleton mirrors of each +- Extension of `packages/codev/src/agent-farm/__tests__/baked-decisions.test.ts` + +#### Acceptance Criteria +Closes spec criteria: *SPIR/ASPIR `prompts/specify.md` instructs the builder...*, *AIR `prompts/implement.md` has an analogous clause*, *contradiction-handling (spec Resolved Decision #7) for drafting prompts*. +- [ ] All 3 codev + 3 skeleton files contain the literal string `Baked Decisions` +- [ ] All contain the carveout phrase (`do not autonomously`) +- [ ] All contain contradiction-handling vocabulary (`contradict` AND `pause` AND `flag`) +- [ ] Each post-edit file's diff vs. its pre-change baseline is pure addition +- [ ] Diff between codev/ and skeleton copies shows no substantive differences + +#### Test Plan +- **Grep regression test** (extending Phase 1's test): reads each of the 6 files and asserts the literal strings + carveout + contradiction vocabulary. +- **Pure-addition diff test** (extending Phase 1's test): runs the same line-diff function against each of the 6 files vs. its baseline. +- **Manual reading**: post-edit, read each file end-to-end to confirm the clause flows in context. + +#### Rollback Strategy +Per-file paragraph revert. + +#### Risks +- **Risk**: The clause lands somewhere a builder would skim past (e.g., buried in the "What NOT to Do" footer). + - **Mitigation**: Place near the top of the operative section as specified above. + +--- + +### Phase 3: Reviewer Prompts (spec-review / plan-review / impl-review / pr-review) +**Dependencies**: Phase 1 (baselines) + +#### Objectives +- Add anti-relitigation language with architect-override carveouts and contradiction-handling to the 6 consult-type prompt files (+ 6 skeleton mirrors). + +#### Deliverables +- [ ] Edits to: + - `codev/protocols/spir/consult-types/spec-review.md` + - `codev/protocols/aspir/consult-types/spec-review.md` + - `codev/protocols/spir/consult-types/plan-review.md` + - `codev/protocols/aspir/consult-types/plan-review.md` + - `codev/protocols/air/consult-types/impl-review.md` + - `codev/protocols/air/consult-types/pr-review.md` +- [ ] Mirrors of each in `codev-skeleton/` +- [ ] Extension of the grep regression test to cover these 12 files +- [ ] Pure-addition diff test against pre-change baselines for these 6 files (the canonical codev/ copies — skeleton parity is asserted separately) + +#### Implementation Details + +**Clause text** (template — adapt per consult-type, final wording TBD during implementation): + +> **Baked Decisions.** If the spec's Constraints section (or the issue body in AIR's case) includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may **`COMMENT`** with concerns about a baked decision (the architect will decide whether to rescind it); reserve **`REQUEST_CHANGES`** for the case where the spec/plan/code **fails to honor** a stated baked decision — that is a real defect. If the baked decisions themselves contain contradictions (e.g., two different language choices), do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + +For `plan-review.md` specifically, the existing "don't re-litigate spec decisions" line stays; the new paragraph supplements it with explicit baked-decision language. + +**Placement**: Insert near the top of the "Notes" or "Focus Areas" section (above existing content), not at the bottom. Keep the paragraph to 3-4 sentences. + +**Files touched** (6 codev + 6 skeleton = 12): +- `codev/protocols/spir/consult-types/spec-review.md` +- `codev/protocols/aspir/consult-types/spec-review.md` +- `codev/protocols/spir/consult-types/plan-review.md` +- `codev/protocols/aspir/consult-types/plan-review.md` +- `codev/protocols/air/consult-types/impl-review.md` +- `codev/protocols/air/consult-types/pr-review.md` +- Skeleton mirrors of each +- Extension of `packages/codev/src/agent-farm/__tests__/baked-decisions.test.ts` + +#### Acceptance Criteria +Closes spec criteria: *SPIR/ASPIR `spec-review.md` contains a "do not autonomously override baked decisions" instruction*, *SPIR/ASPIR `plan-review.md` extends its existing language*, *AIR `impl-review.md` / `pr-review.md` have analogous instructions*, *contradiction-handling (spec Resolved Decision #7) for reviewer prompts*. +- [ ] All 6 codev + 6 skeleton files contain the literal string `Baked Decisions` +- [ ] All contain the carveout phrase (`do not autonomously`) +- [ ] All explicitly distinguish `COMMENT` from `REQUEST_CHANGES` +- [ ] All contain contradiction-handling vocabulary (`contradict` AND `clarify`) +- [ ] Each post-edit codev/ file's diff vs. its pre-change baseline is pure addition +- [ ] Diff between codev/ and skeleton copies shows no substantive differences + +#### Test Plan +- **Grep regression test** (extending Phase 1/2's test): reads each of the 12 files and asserts the literal strings. +- **Pure-addition diff test** (extending Phase 1/2's test): runs the line-diff against each of the 6 codev/ files vs. its baseline. +- **Read-through**: post-edit, read each file in full to confirm the new paragraph fits the existing structure. + +#### Rollback Strategy +Per-file paragraph revert. + +#### Risks +- **Risk**: Reviewer prompts collectively grow long enough that LLMs skim past the new clause. + - **Mitigation**: Place clause near the top of Notes / Focus Areas; keep to 3-4 sentences. + +--- + +### Phase 4: Protocol Documentation Paragraphs + Final Regression Sweep +**Dependencies**: Phases 1-3 (so all prompt edits are in place; this phase verifies them collectively) + +#### Objectives +- Add a discoverability paragraph to each `protocol.md` (SPIR, ASPIR, AIR) + skeleton mirrors. Per spec Resolved Decision #11, this is the primary discoverability surface. +- Run the final cross-phase regression sweep: full grep suite + `diff -r` skeleton parity check + a manual smoke confirming a real spawn renders the new content. + +#### Deliverables +- [ ] Paragraph in `codev/protocols/spir/protocol.md` +- [ ] Paragraph in `codev/protocols/aspir/protocol.md` +- [ ] Paragraph in `codev/protocols/air/protocol.md` +- [ ] Skeleton mirrors of all three +- [ ] Grep regression test for the keyword "Baked Decisions" in each protocol.md +- [ ] Final cross-phase grep sweep test (re-runs every Phase 1-3 grep) +- [ ] Skeleton-parity assertion: `diff -r codev/protocols/ codev-skeleton/protocols/` clean for touched files +- [ ] Manual smoke: spawn a builder against a fixture issue containing a `## Baked Decisions` section, confirm the rendered prompt the builder receives includes both the issue's baked-decisions content (via `{{issue.body}}`) and the instruction paragraph telling them how to handle it (added in Phase 1) + +#### Implementation Details + +**Paragraph text** (final wording TBD during implementation): + +```markdown +### Baked Decisions (Optional) + +When filing an issue for SPIR / ASPIR / AIR, you can pin architectural decisions you don't want the builder or CMAP reviewers to re-litigate. Include a `## Baked Decisions` section (any heading level is fine) anywhere in the issue body. Useful categories: language, framework, deployment shape, key dependencies, decisions deferred to a later spec. The builder will copy the section verbatim into the spec's Constraints and treat each item as fixed; CMAP reviewers will not propose alternatives unless the spec itself fails to honor a stated decision. Leave the section out for issues where you want the builder to explore freely — absence is the no-op default. You can amend or rescind a baked decision at any time by updating the issue and respawning, or by sending the builder a direct instruction via `afx send`. +``` + +**Placement**: Insert as a sub-section after the protocol's "Overview" or "When to Use" section — somewhere an architect reading top-down will encounter it before invoking the protocol. + +**Files touched**: +- `codev/protocols/spir/protocol.md` +- `codev/protocols/aspir/protocol.md` +- `codev/protocols/air/protocol.md` +- `codev-skeleton/protocols/{spir,aspir,air}/protocol.md` +- Extension of `packages/codev/src/agent-farm/__tests__/baked-decisions.test.ts` for the docs grep + the cross-phase final sweep + +#### Acceptance Criteria +Closes spec criterion: *Documentation — each protocol.md contains a paragraph instructing architects how to declare baked decisions*. Also closes *Skeleton parity* and the cross-phase no-regression sweep. +- [ ] `grep -l "Baked Decisions" codev/protocols/{spir,aspir,air}/protocol.md` returns three files +- [ ] Same for `codev-skeleton/protocols/{spir,aspir,air}/protocol.md` +- [ ] Each paragraph mentions the category hints (language / framework / etc.) +- [ ] Each paragraph documents the rescind/amend escape hatch +- [ ] codev/ and skeleton diff clean +- [ ] Cross-phase grep sweep: all required strings present in every file touched in Phases 1-3 +- [ ] Manual smoke: spawned builder's rendered prompt visibly includes the instruction paragraph + the issue's baked-decisions content; confirmed by reading the rendered prompt file or watching the builder reference it + +#### Test Plan +- **Grep regression test**: vitest assertion on the keyword + category hint words in each protocol.md. +- **Cross-phase grep sweep**: single test that re-runs every grep assertion from Phases 1-3 in one pass. +- **Skeleton-parity test** (vitest, optional but recommended): walk all touched files and assert codev/ and skeleton copies match. +- **Manual reading**: confirm each paragraph reads naturally in surrounding protocol prose. +- **Manual smoke**: spawn a builder against `__tests__/fixtures/issue-with-baked.md` (the existing fixture from earlier plans — re-purposed here as a sanity check) and inspect the resulting `.builder-prompt.txt` for the expected content. + +#### Rollback Strategy +Per-file paragraph revert. + +#### Risks +- **Risk**: Paragraph wording drifts between SPIR / ASPIR / AIR. + - **Mitigation**: Single canonical paragraph copied to all three with minor name adjustments; grep test enforces keyword consistency. +- **Risk**: Manual smoke is skipped; subtle integration issue ships. + - **Mitigation**: The smoke is explicitly listed as an acceptance criterion above; PR review can ask for evidence (e.g., a screenshot or pasted excerpt). + +## Dependency Map +``` +Phase 1 (builder-prompts + baselines) ──→ Phase 2 (drafting prompts) + ├──→ Phase 3 (reviewer prompts) + └──→ Phase 4 (docs + final sweep) +``` + +Phases 2 and 3 are independent of each other and can be done in either order after Phase 1. Phase 4 depends on Phases 2 and 3 because its cross-phase grep sweep needs their edits to be present. + +## Resource Requirements +### Development Resources +- **Engineers**: One builder (this one), comfortable with prompt-template editing and vitest +- **Environment**: standard Codev dev environment; `pnpm install` + `pnpm --filter @cluesmith/codev test` + +### Infrastructure +- None new. + +## Integration Points +### External Systems +None. + +### Internal Systems +- **Tower / spawn pipeline**: unchanged. No code surface touched. `{{issue.body}}` continues to carry the issue body verbatim, including any `## Baked Decisions` section. +- **CMAP reviewer pipeline (`consult` CLI)**: consumes the consult-type prompts as static markdown. The added paragraphs flow through the existing pipeline; no consult-tooling change. +- **Skeleton-sync**: the standard rule — every edit in `codev/protocols/` mirrored to `codev-skeleton/protocols/` — applies. + +## Risk Analysis +### Technical Risks +| Risk | Probability | Impact | Mitigation | Owner | +|------|-------------|--------|------------|-------| +| Paragraph wording drifts across the three protocols | Medium | Low | Single canonical paragraph copied to all; grep test enforces keywords | Builder | +| Builder-prompt addition changes whitespace and breaks unrelated snapshot tests | Low | Low | Pure-addition diff assertion catches this; the template's existing trim/dedup post-processing handles minor spacing | Builder | +| Skeleton mirrors drift from codev/ | Low | Low | Skeleton-parity assertion baked into Phase 4's final sweep | Builder | +| Reviewer prompts grow long enough that the new clause is skimmed | Medium | Low | Place clause near top of Notes / Focus Areas; keep to 3-4 sentences | Builder | +| Baseline capture happens after an unintended edit, polluting the baseline | Low | Medium | First commit of Phase 1 is exclusively the baseline capture, before any prompt edits | Builder | +| LLM misses a baked-decisions section because the heading is unusual ("Architectural Givens") | Medium | Low | The prompt instruction names "Baked Decisions" specifically; architects who use the convention will use that name. Unusual variants are explicitly out of scope — the spec requires recognition of the literal "Baked Decisions" section name. | Architect | + +### Schedule Risks +| Risk | Probability | Impact | Mitigation | Owner | +|------|-------------|--------|------------|-------| +| CMAP iter on plan exposes a hole | Low | Low | Plan is minimal; smallest reasonable scope | Builder | + +## Validation Checkpoints +1. **After Phase 1**: Baselines captured before edits. Three builder-prompts contain the instruction paragraph. Grep + pure-addition diff tests green. +2. **After Phase 2**: Drafting/implement prompts contain the carveout clause and contradiction-handling. Grep + pure-addition diff tests green. +3. **After Phase 3**: All six reviewer prompts contain the anti-relitigation language with COMMENT/REQUEST_CHANGES distinction and contradiction-handling. Grep + pure-addition diff tests green. +4. **After Phase 4**: Three protocol.md files documented. Cross-phase grep sweep green. Skeleton parity clean. Manual smoke confirmed. +5. **Before PR**: Full `pnpm --filter @cluesmith/codev test` green. + +## Monitoring and Observability +Not applicable — this is a prompt-and-documentation change with no runtime behavior. + +## Documentation Updates Required +- [ ] `codev/protocols/spir/protocol.md`: discoverability paragraph (Phase 4) +- [ ] `codev/protocols/aspir/protocol.md`: discoverability paragraph (Phase 4) +- [ ] `codev/protocols/air/protocol.md`: discoverability paragraph (Phase 4) +- [ ] `codev-skeleton/protocols/{spir,aspir,air}/protocol.md`: mirrors (Phase 4) +- [ ] Review document (`codev/reviews/746-spir-architect-s-baked-archite.md`) per SPIR's Review phase + +## Post-Implementation Tasks +- [ ] (Optional, deferred) Consider whether `afx spawn` should warn when it detects `## Baked Decisions` in an issue body but the section is empty — listed as a Nice-to-Know in the spec; not in this plan's scope. + +## Expert Review + +**Iteration 1 — 2026-05-17**: Reviewed by Gemini, Codex, Claude. Verdicts: Gemini `APPROVE`, Codex `REQUEST_CHANGES`, Claude `APPROVE`. Plan was then revised per iter-2 to address Codex's three issues (spawn.ts wiring, consult-prompt no-regression, contradiction-handling). + +**Architect Feedback — 2026-05-17** (post iter-2 plan-approval gate): + +- **Dropped the parser entirely.** No `extractBakedDecisions()`, no `TemplateContext.baked_decisions`, no `{{#if baked_decisions}}` template block, no code surface. The LLM finds the section in the issue body (already passed via `{{issue.body}}`) and honors it because the prompt tells it to. Reasoning: (1) builder-prompts and reviewer-prompts (which are static markdown) were going to get instruction-only treatment regardless — splitting them across two paradigms (templated vs. instruction-driven) added asymmetry without benefit; (2) LLM-driven recognition is more robust to variant section names than a regex parser; (3) prompt-driven discipline is Codev's core ethos. +- **Reduced from 5 phases to 4.** Phase 1 (parser) and Phase 5 (e2e fixtures + sweep) are gone; their valid parts (cross-phase grep sweep, manual smoke) are folded into the new Phase 4. +- **Reduced from 15 baselines to 12.** No template-rendering snapshots needed; just raw-file pre-change baselines for the 12 prompt files (3 builder + 3 drafting + 6 reviewer). Protocol.md files don't need baselines because their additions are entirely new sub-sections (grep + manual readthrough is enough). +- **Contradiction handling stays** as instruction text in both drafting and reviewer prompts (per Codex iter-1 #3) — already integrated. +- **Test infrastructure simplifies** — no parser unit tests, no template-rendering snapshots, no fixture-issue files. Just grep tests and pure-addition diff tests, all in one new `baked-decisions.test.ts` file. + +## Approval +- [ ] Technical Lead Review +- [ ] Engineering Manager Approval +- [ ] Resource Allocation Confirmed +- [ ] Expert AI Consultation Complete + +## Change Log +| Date | Change | Reason | Author | +|------|--------|--------|--------| +| 2026-05-17 | Initial plan draft | Spec 746 approved by architect | Builder | +| 2026-05-17 | iter-2: spawn.ts wiring clarification, consult-prompt no-regression, contradiction-handling | CMAP feedback (Codex REQUEST_CHANGES) | Builder | +| 2026-05-17 | iter-3: dropped parser entirely; reduced to 4 phases; pure prompt+docs change | Architect feedback (over-engineering) | Builder | + +## Notes + +- This plan is intentionally minimal — pure prompt-and-documentation change with grep + pure-addition diff tests as the only verification. No code surface means no rollback complexity and no maintenance burden. +- The architect-override carveout (spec Resolved Decision #12) is the most important framing constraint. Every prompt addition uses "do not autonomously …" rather than absolute prohibition. PR reviewer should grep for and verify this in every touched file. +- Phases 2 and 3 are highly parallelizable. The plan orders them 1→2→3→4 for readability; the actual implementation can interleave them as long as Phase 1's baseline capture is the first action. + +--- + +## Amendment History + + diff --git a/codev/projects/746-spir-architect-s-baked-archite/status.yaml b/codev/projects/746-spir-architect-s-baked-archite/status.yaml new file mode 100644 index 000000000..db2dfe423 --- /dev/null +++ b/codev/projects/746-spir-architect-s-baked-archite/status.yaml @@ -0,0 +1,86 @@ +id: '746' +title: spir-architect-s-baked-archite +protocol: spir +phase: review +plan_phases: + - id: phase_1 + title: Builder-prompt instruction (SPIR/ASPIR/AIR + skeleton) + status: complete + - id: phase_2 + title: 'Drafting prompts: specify.md (SPIR/ASPIR) + implement.md (AIR) + skeleton' + status: complete + - id: phase_3 + title: 'Reviewer prompts: spec-review / plan-review / impl-review / pr-review + skeleton' + status: complete + - id: phase_4 + title: Protocol documentation + final regression sweep + status: complete +current_plan_phase: null +gates: + spec-approval: + status: approved + requested_at: '2026-05-14T21:09:15.512Z' + approved_at: '2026-05-17T19:02:30.947Z' + plan-approval: + status: approved + requested_at: '2026-05-17T19:15:07.461Z' + approved_at: '2026-05-17T20:34:01.445Z' + pr: + status: approved + requested_at: '2026-05-17T21:18:48.450Z' + approved_at: '2026-05-18T01:19:45.563Z' + verify-approval: + status: pending +iteration: 1 +build_complete: true +history: + - iteration: 1 + plan_phase: phase_1 + build_output: '' + reviews: + - model: gemini + verdict: APPROVE + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_1-iter1-gemini.txt + - model: codex + verdict: REQUEST_CHANGES + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_1-iter1-codex.txt + - model: claude + verdict: APPROVE + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_1-iter1-claude.txt + - iteration: 1 + plan_phase: phase_3 + build_output: '' + reviews: + - model: gemini + verdict: APPROVE + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_3-iter1-gemini.txt + - model: codex + verdict: REQUEST_CHANGES + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_3-iter1-codex.txt + - model: claude + verdict: APPROVE + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_3-iter1-claude.txt + - iteration: 1 + plan_phase: phase_4 + build_output: '' + reviews: + - model: gemini + verdict: APPROVE + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_4-iter1-gemini.txt + - model: codex + verdict: REQUEST_CHANGES + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_4-iter1-codex.txt + - model: claude + verdict: APPROVE + file: >- + /Users/mwk/Development/cluesmith/codev/.builders/spir-746/codev/projects/746-spir-architect-s-baked-archite/746-phase_4-iter1-claude.txt +started_at: '2026-05-14T20:59:40.415Z' +updated_at: '2026-05-18T01:19:45.563Z' diff --git a/codev/protocols/air/builder-prompt.md b/codev/protocols/air/builder-prompt.md index 8a44aede1..963d5e5fe 100644 --- a/codev/protocols/air/builder-prompt.md +++ b/codev/protocols/air/builder-prompt.md @@ -26,6 +26,12 @@ You are running in STRICT mode. This means: ## Protocol Follow the AIR protocol: `codev/protocols/air/protocol.md` +## Baked Decisions + +If the issue body contains a section named "Baked Decisions" (any heading level, case-insensitive), treat its contents as fixed architectural decisions baked in by the architect. Do not autonomously override them in your spec, plan, or implementation. If you discover a serious reason to question a baked decision, surface that concern to the architect via `afx send` rather than relitigating it inside the spec/plan/review. + +If the architect's baked-decisions section contains internal contradictions (e.g., two different language choices), do not pick one — pause, flag the contradiction to the architect via `afx send`, and wait for resolution before proceeding. + {{#if issue}} ## Issue #{{issue.number}} **Title**: {{issue.title}} diff --git a/codev/protocols/air/consult-types/impl-review.md b/codev/protocols/air/consult-types/impl-review.md index aacdefbdb..b382faedc 100644 --- a/codev/protocols/air/consult-types/impl-review.md +++ b/codev/protocols/air/consult-types/impl-review.md @@ -10,6 +10,12 @@ Before requesting changes for missing configuration, incorrect patterns, or fram 2. **Read the actual config files** (or confirm their deliberate absence) before flagging missing configs 3. **Do not assume** your training data reflects the version in use — verify against project files +## Baked Decisions + +If the issue body includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns about a baked decision (the architect decides whether to rescind it); reserve `REQUEST_CHANGES` for the case where the implementation **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions, do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Issue Adherence** diff --git a/codev/protocols/air/consult-types/pr-review.md b/codev/protocols/air/consult-types/pr-review.md index 915903934..8c0f6c552 100644 --- a/codev/protocols/air/consult-types/pr-review.md +++ b/codev/protocols/air/consult-types/pr-review.md @@ -3,6 +3,12 @@ ## Context You are performing a review of a pull request created under the AIR protocol. The builder implemented a small feature directly from a GitHub issue — there are no spec, plan, or review files. The review is embedded in the PR body. +## Baked Decisions + +If the issue body includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns about a baked decision (the architect decides whether to rescind it); reserve `REQUEST_CHANGES` for the case where the code **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions, do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Completeness** diff --git a/codev/protocols/air/prompts/implement.md b/codev/protocols/air/prompts/implement.md index 52767f21a..301641962 100644 --- a/codev/protocols/air/prompts/implement.md +++ b/codev/protocols/air/prompts/implement.md @@ -6,6 +6,12 @@ You are executing the **IMPLEMENT** phase of the AIR protocol. Read the GitHub issue, implement the feature, and add tests. Keep it focused and under 300 LOC. +## Baked Decisions + +Check the issue body for a section named "Baked Decisions" (any heading level, case-insensitive). If present, treat each listed decision as fixed during implementation. Do not autonomously substitute alternate languages, frameworks, or dependencies. If you discover a serious problem with a baked decision, raise it via `afx send architect` rather than working around it. + +If two baked decisions contradict each other, do not pick one — pause, flag the contradiction via `afx send`, and wait for resolution before implementing. + ## Context - **Issue**: #{{issue.number}} — {{issue.title}} diff --git a/codev/protocols/air/protocol.md b/codev/protocols/air/protocol.md index ead4b5313..8552d3c3b 100644 --- a/codev/protocols/air/protocol.md +++ b/codev/protocols/air/protocol.md @@ -38,6 +38,10 @@ AIR is a minimal protocol for implementing small features (< 300 LOC) where the - Architectural changes → use **SPIR** - Complex features with multiple phases → use **SPIR** or **ASPIR** +## Baked Decisions (Optional) + +When filing an issue for AIR, you can pin architectural decisions you don't want the builder or CMAP reviewers to re-litigate. Include a `## Baked Decisions` section (any heading level is fine) anywhere in the issue body. Useful categories: language, framework, deployment shape, key dependencies, decisions deferred to a later spec. The builder will treat each listed item as fixed during implementation; CMAP reviewers will not propose alternatives unless the implementation itself fails to honor a stated decision. Leave the section out for issues where you want the builder to explore freely — absence is the no-op default. You can amend or rescind a baked decision at any time by updating the issue and respawning, or by sending the builder a direct instruction via `afx send`. + ## Protocol Phases ### I - Implement diff --git a/codev/protocols/aspir/builder-prompt.md b/codev/protocols/aspir/builder-prompt.md index 2715ed877..9a1142471 100644 --- a/codev/protocols/aspir/builder-prompt.md +++ b/codev/protocols/aspir/builder-prompt.md @@ -30,6 +30,12 @@ You are running in STRICT mode. This means: Follow the ASPIR protocol: `codev/protocols/aspir/protocol.md` Read and internalize the protocol before starting any work. +## Baked Decisions + +If the issue body contains a section named "Baked Decisions" (any heading level, case-insensitive), treat its contents as fixed architectural decisions baked in by the architect. Do not autonomously override them in your spec, plan, or implementation. If you discover a serious reason to question a baked decision, surface that concern to the architect via `afx send` rather than relitigating it inside the spec/plan/review. + +If the architect's baked-decisions section contains internal contradictions (e.g., two different language choices), do not pick one — pause, flag the contradiction to the architect via `afx send`, and wait for resolution before proceeding. + {{#if spec}} ## Spec Read the specification at: `{{spec.path}}` diff --git a/codev/protocols/aspir/consult-types/plan-review.md b/codev/protocols/aspir/consult-types/plan-review.md index 585085dec..485ff3183 100644 --- a/codev/protocols/aspir/consult-types/plan-review.md +++ b/codev/protocols/aspir/consult-types/plan-review.md @@ -3,6 +3,12 @@ ## Context You are reviewing an implementation plan during the Plan phase. The spec has been approved - now you must evaluate whether the plan adequately describes HOW to implement it. +## Baked Decisions + +If the issue body or the approved spec's Constraints section includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed (this extends the existing "don't re-litigate spec decisions" rule with explicit baked-decision language). Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns; reserve `REQUEST_CHANGES` for the case where the plan **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions, do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Spec Coverage** diff --git a/codev/protocols/aspir/consult-types/spec-review.md b/codev/protocols/aspir/consult-types/spec-review.md index 7c9c1579b..b537e7cc6 100644 --- a/codev/protocols/aspir/consult-types/spec-review.md +++ b/codev/protocols/aspir/consult-types/spec-review.md @@ -3,6 +3,12 @@ ## Context You are reviewing a feature specification during the Specify phase. Your role is to ensure the spec is complete, correct, and feasible before it moves to human approval. +## Baked Decisions + +If the issue body or the spec's Constraints section includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns about a baked decision (the architect decides whether to rescind it); reserve `REQUEST_CHANGES` for the case where the spec **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions (e.g., two different language choices), do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Completeness** diff --git a/codev/protocols/aspir/prompts/specify.md b/codev/protocols/aspir/prompts/specify.md index 6da1868f8..3716f7969 100644 --- a/codev/protocols/aspir/prompts/specify.md +++ b/codev/protocols/aspir/prompts/specify.md @@ -31,6 +31,12 @@ ls codev/specs/{{project_id}}-*.md **If no spec exists:** Proceed to Step 1 below. +### 0.5 Baked Decisions + +Before exploring solution approaches, check the issue body for a section named "Baked Decisions" (any heading level, case-insensitive). If present, copy its content verbatim into the spec's Constraints section and treat each item as fixed. Do not autonomously relitigate the architect's choices in your Solution Exploration. If you discover a serious problem with a baked decision, raise it via `afx send architect` rather than overriding it in the spec. + +If two baked decisions contradict each other (e.g., two different language choices), do not pick one — pause, flag the contradiction via `afx send`, and wait for resolution before drafting. + ### 1. Clarifying Questions (ONLY IF NO SPEC EXISTS) Before writing anything, ask clarifying questions to understand: diff --git a/codev/protocols/aspir/protocol.md b/codev/protocols/aspir/protocol.md index b79391f32..18a423a30 100644 --- a/codev/protocols/aspir/protocol.md +++ b/codev/protocols/aspir/protocol.md @@ -41,6 +41,10 @@ Use SPIR instead when: - The work is **high-risk** — security-sensitive, user-facing, or broadly impactful - You want to **review and adjust** the plan before implementation starts +## Baked Decisions (Optional) + +When filing an issue for ASPIR, you can pin architectural decisions you don't want the builder or CMAP reviewers to re-litigate. Include a `## Baked Decisions` section (any heading level is fine) anywhere in the issue body. Useful categories: language, framework, deployment shape, key dependencies, decisions deferred to a later spec. The builder will copy the section verbatim into the spec's Constraints and treat each item as fixed; CMAP reviewers will not propose alternatives unless the spec itself fails to honor a stated decision. Leave the section out for issues where you want the builder to explore freely — absence is the no-op default. You can amend or rescind a baked decision at any time by updating the issue and respawning, or by sending the builder a direct instruction via `afx send`. + ## Protocol Phases ASPIR follows the same four phases as SPIR. For full phase documentation, see the [SPIR protocol](../spir/protocol.md). diff --git a/codev/protocols/spir/builder-prompt.md b/codev/protocols/spir/builder-prompt.md index c905b4696..41e2b306f 100644 --- a/codev/protocols/spir/builder-prompt.md +++ b/codev/protocols/spir/builder-prompt.md @@ -30,6 +30,12 @@ You are running in STRICT mode. This means: Follow the SPIR protocol: `codev/protocols/spir/protocol.md` Read and internalize the protocol before starting any work. +## Baked Decisions + +If the issue body contains a section named "Baked Decisions" (any heading level, case-insensitive), treat its contents as fixed architectural decisions baked in by the architect. Do not autonomously override them in your spec, plan, or implementation. If you discover a serious reason to question a baked decision, surface that concern to the architect via `afx send` rather than relitigating it inside the spec/plan/review. + +If the architect's baked-decisions section contains internal contradictions (e.g., two different language choices), do not pick one — pause, flag the contradiction to the architect via `afx send`, and wait for resolution before proceeding. + {{#if spec}} ## Spec Read the specification at: `{{spec.path}}` diff --git a/codev/protocols/spir/consult-types/plan-review.md b/codev/protocols/spir/consult-types/plan-review.md index 585085dec..485ff3183 100644 --- a/codev/protocols/spir/consult-types/plan-review.md +++ b/codev/protocols/spir/consult-types/plan-review.md @@ -3,6 +3,12 @@ ## Context You are reviewing an implementation plan during the Plan phase. The spec has been approved - now you must evaluate whether the plan adequately describes HOW to implement it. +## Baked Decisions + +If the issue body or the approved spec's Constraints section includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed (this extends the existing "don't re-litigate spec decisions" rule with explicit baked-decision language). Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns; reserve `REQUEST_CHANGES` for the case where the plan **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions, do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Spec Coverage** diff --git a/codev/protocols/spir/consult-types/spec-review.md b/codev/protocols/spir/consult-types/spec-review.md index 7c9c1579b..b537e7cc6 100644 --- a/codev/protocols/spir/consult-types/spec-review.md +++ b/codev/protocols/spir/consult-types/spec-review.md @@ -3,6 +3,12 @@ ## Context You are reviewing a feature specification during the Specify phase. Your role is to ensure the spec is complete, correct, and feasible before it moves to human approval. +## Baked Decisions + +If the issue body or the spec's Constraints section includes content under a "Baked Decisions" heading, the architect has marked those choices as fixed. Do not autonomously challenge them: do not propose alternative languages, frameworks, deployment shapes, or dependencies that contradict a baked decision. You may `COMMENT` with concerns about a baked decision (the architect decides whether to rescind it); reserve `REQUEST_CHANGES` for the case where the spec **fails to honor** a stated baked decision — that is a real defect. + +If the baked decisions themselves contain contradictions (e.g., two different language choices), do not pick one — `REQUEST_CHANGES` and ask the architect to clarify before proceeding. + ## Focus Areas 1. **Completeness** diff --git a/codev/protocols/spir/prompts/specify.md b/codev/protocols/spir/prompts/specify.md index 6da1868f8..3716f7969 100644 --- a/codev/protocols/spir/prompts/specify.md +++ b/codev/protocols/spir/prompts/specify.md @@ -31,6 +31,12 @@ ls codev/specs/{{project_id}}-*.md **If no spec exists:** Proceed to Step 1 below. +### 0.5 Baked Decisions + +Before exploring solution approaches, check the issue body for a section named "Baked Decisions" (any heading level, case-insensitive). If present, copy its content verbatim into the spec's Constraints section and treat each item as fixed. Do not autonomously relitigate the architect's choices in your Solution Exploration. If you discover a serious problem with a baked decision, raise it via `afx send architect` rather than overriding it in the spec. + +If two baked decisions contradict each other (e.g., two different language choices), do not pick one — pause, flag the contradiction via `afx send`, and wait for resolution before drafting. + ### 1. Clarifying Questions (ONLY IF NO SPEC EXISTS) Before writing anything, ask clarifying questions to understand: diff --git a/codev/protocols/spir/protocol.md b/codev/protocols/spir/protocol.md index 57e5c2e6d..3a4f8c11e 100644 --- a/codev/protocols/spir/protocol.md +++ b/codev/protocols/spir/protocol.md @@ -121,6 +121,10 @@ SPIR is a structured development protocol that emphasizes specification-driven d - Dependency updates - Emergency hotfixes (but do a lightweight retrospective after) +## Baked Decisions (Optional) + +When filing an issue for SPIR, you can pin architectural decisions you don't want the builder or CMAP reviewers to re-litigate. Include a `## Baked Decisions` section (any heading level is fine) anywhere in the issue body. Useful categories: language, framework, deployment shape, key dependencies, decisions deferred to a later spec. The builder will copy the section verbatim into the spec's Constraints and treat each item as fixed; CMAP reviewers will not propose alternatives unless the spec itself fails to honor a stated decision. Leave the section out for issues where you want the builder to explore freely — absence is the no-op default. You can amend or rescind a baked decision at any time by updating the issue and respawning, or by sending the builder a direct instruction via `afx send`. + ## Protocol Phases ### S - Specify (Collaborative Design Exploration) diff --git a/codev/resources/lessons-learned.md b/codev/resources/lessons-learned.md index 4eb7365a5..dbafd1ca8 100644 --- a/codev/resources/lessons-learned.md +++ b/codev/resources/lessons-learned.md @@ -197,6 +197,9 @@ Generalizable wisdom extracted from review documents, ordered by impact. Updated - [From 0121] Update spec and plan simultaneously rather than in separate passes. Multiple rounds of corrections were needed for stale references to old values (max_iterations=5, >50 bytes size checks) that had been changed in one document but not the other. - [From 0122] When a spec is filed for functionality that already exists (built incrementally across prior specs/bugfixes), the plan should note this upfront. Discovery during implementation wastes time -- the plan should focus on validation and enhancement. - [From 0124] Plan estimates based on file-level scanning can misattribute tests. Phase 3's tunnel estimate was wrong because the plan listed tests by file without verifying which describe blocks were in which file. Include a preliminary audit step where files are actually read before estimating removals. +- [From 0746] Default to LLM-driven recognition over regex parsing for features whose discipline is enforced by prompts. Only add a parser when the LLM provably cannot do the job (e.g., security parsing, structured machine contracts). Adding code surface for what is fundamentally a prompt-discipline question is over-engineering. (The original plan for 0746 had a regex parser + new `TemplateContext` field; architect feedback at the plan-approval gate dropped both, reducing 5 phases to 4 and 15 baselines to 12 with zero loss of capability.) +- [From 0746] Plan acceptance criteria involving mirrored trees (codev/ ↔ codev-skeleton/) should scope to "the new content is byte-identical between codev/ and skeleton" rather than "codev/ and skeleton diff clean". Pre-existing structural divergence between the two trees is the norm; whole-file parity criteria force phases to either conflate unrelated cleanup or generate repeated rebuttal traffic when reviewers (correctly, per the literal language) flag it as failing. +- [From 0746] Convert "manual smoke" deliverables to programmatic tests at design time. If `renderTemplate` (or equivalent) is accessible from the test runner, running the smoke automatically is strictly stronger than a one-time manual check and adds zero ongoing maintenance cost. Codex caught the missing smoke evidence; the fix should have shipped in Phase 1, not in a Phase 4 rebuttal. - [From 0124] Set removal/consolidation targets as ranges rather than point estimates. The spec targeted ~285 tests but achieved 127 -- the aspirational target was unrealistic after applying the "when in doubt, keep the test" guardrail. - [From 0126] Six-phase bottom-up plan (GitHub layer -> spawn CLI -> scaffold -> tower endpoint -> work view -> cleanup) with clear dependency ordering meant each phase built cleanly on the previous. Test-first approach for spawn CLI caught edge cases in zero-padded ID handling. - [From 0126] Update skeleton docs when changing user-facing behavior. Initially rebutted as out-of-scope, but shipping skeleton docs with stale references confuses users of new projects. diff --git a/codev/reviews/746-spir-architect-s-baked-archite.md b/codev/reviews/746-spir-architect-s-baked-archite.md new file mode 100644 index 000000000..a9fe7de2d --- /dev/null +++ b/codev/reviews/746-spir-architect-s-baked-archite.md @@ -0,0 +1,222 @@ +# Review: Baked Architectural Decisions in SPIR Issue Body (#746) + +## Summary + +Adds a structured channel for architects to pin architectural decisions in SPIR / ASPIR / AIR issue bodies, so builders and CMAP reviewers honor those decisions instead of re-litigating them. Pure prompt-and-documentation change — zero new code surface, zero runtime impact. + +Architects who file an issue with a `## Baked Decisions` section see those decisions: +- Surfaced explicitly to the builder by the builder-prompt +- Honored during spec drafting (SPIR/ASPIR) or implementation (AIR) via the drafting prompts +- Protected from relitigation by all six CMAP reviewer prompts +- Discoverable via a new sub-section in each protocol.md + +Total scope: 30 files edited (3 codev + 3 skeleton × 5 file types) + 1 new test file with 193 tests + 12 baseline snapshots. Net diff: +1,500 / -50 lines, almost entirely markdown. + +## Spec Compliance + +Every Success Criterion from the spec (`codev/specs/746-spir-architect-s-baked-archite.md`) is satisfied: + +- [x] **SPIR builder-prompt** surfaces baked decisions — `## Baked Decisions` paragraph added between `## Protocol` and `{{#if spec}}` +- [x] **ASPIR builder-prompt** behaves identically — same paragraph, same placement +- [x] **AIR builder-prompt** surfaces baked decisions — same paragraph, placed between `## Protocol` and `{{#if issue}}` +- [x] **SPIR `prompts/specify.md`** instructs the builder — new `### 0.5 Baked Decisions` clause directs the builder to copy the section verbatim into Constraints +- [x] **ASPIR `prompts/specify.md`** has the same clause +- [x] **AIR `prompts/implement.md`** has the analogous clause — adapted for the no-spec workflow +- [x] **SPIR/ASPIR `consult-types/spec-review.md`** contain anti-relitigation instruction with COMMENT-vs-REQUEST_CHANGES distinction +- [x] **SPIR/ASPIR `consult-types/plan-review.md`** extend the existing "don't re-litigate" line with explicit baked-decision language +- [x] **AIR `consult-types/impl-review.md`** and **`pr-review.md`** have analogous instructions +- [x] **Documentation** — each `protocol.md` (SPIR/ASPIR/AIR) has a discoverability paragraph with category hints (language / framework / dependencies) and the amend/rescind escape hatch +- [x] **Skeleton mirror** — every edit in `codev/protocols/` is mirrored to `codev-skeleton/protocols/`; mirror parity asserted in tests +- [x] **Snapshot/no-regression** — replaced with pure-addition diff against pre-change baselines for all 12 prompt files (per architect feedback in iter-3 — see Deviations) +- [x] **No regression** — every touched static file's diff vs. its baseline is pure-addition (zero removed lines, zero modified lines) + +## Deviations from Plan + +### Iter-3 plan revision: dropped the parser entirely + +The original plan (iter-2 of plan-approval) included: +- A `extractBakedDecisions(issueBody)` parser in `spawn-roles.ts` +- A new `TemplateContext.baked_decisions?: string` field +- A `{{#if baked_decisions}}` block in each builder-prompt template +- Parser unit tests + snapshot tests for template rendering + +The architect rejected this approach at the plan-approval gate (2026-05-17 ~20:34 PDT), reasoning: +1. Builder-prompts and reviewer-prompts (which are static markdown) were getting instruction-only treatment regardless. Splitting them into two paradigms (templated vs. instruction-driven) added asymmetry without benefit. +2. LLM-driven recognition is more robust to variant section names (`## Constraints (fixed)`, `## Architectural Givens`) than a regex parser would be. +3. Prompt-driven discipline is Codev's core ethos — adding ~80 LOC of parser + edge-case handling was over-engineering. + +The plan was rewritten to a pure prompt-and-documentation change. Net effect: +- Phases reduced from 5 to 4 (parser phase + e2e fixture phase removed) +- Baselines reduced from 15 to 12 (no template-rendering snapshots needed) +- Zero new code surface +- Test infrastructure simplified to grep + pure-addition diff + +This deviation produced a strictly simpler, more maintainable result. + +### Other deviations + +None. All four phases landed as specified in the iter-3 plan. + +## Lessons Learned + +### What Went Well + +- **Architect feedback at the plan-approval gate prevented over-engineering.** Without the iter-3 rewrite, this PR would have shipped ~80 LOC of regex parser + new test infrastructure that didn't earn its keep. The rebuttal/iteration cycle paid off. +- **CMAP caught real test gaps.** Codex Phase 3 iter-1 spotted that the COMMENT/REQUEST_CHANGES check was too loosely scoped — it would pass even if the new paragraph lost those tokens because they already exist in the pre-existing Verdict Format section. The fix (extract the section first, then grep) is a generalizable pattern. +- **Programmatic smoke beats manual smoke.** Codex Phase 4 iter-1 caught that the "manual smoke" deliverable wasn't evidenced; converting it to a programmatic test (running `renderTemplate` against fixture issues in vitest) is strictly stronger and runs every test invocation. +- **Pure-addition diff against pre-change baselines is a lightweight no-regression mechanism.** No diff library needed — a 25-line line-walking helper does the job. Catches "someone accidentally deleted existing content while adding the new paragraph." + +### Challenges Encountered + +- **"Diff clean" as a parity criterion conflicts with pre-existing file divergence.** Codev/ and skeleton trees have intentional structural differences (Multi-PR / Verify sections present in skeleton, absent in codev/) that pre-date this work. Codex Phase 1 and Phase 4 both flagged file-level `diff -r` as failing — but reconciling that divergence would conflate two unrelated changes. Resolution: scope mirror-parity tests to the **section this phase changes** (extract `## Baked Decisions`, compare those byte-for-byte) rather than asserting whole-file parity. Documented in two rebuttals; both accepted. +- **Plan acceptance criteria need to be precise about scope.** The Phase 1, 3, and 4 plans all had "diff clean" or similar language that read as broader than intended. Future plans should write "[the new section] is byte-identical across codev/ and skeleton" rather than "codev/ and skeleton diff clean". + +### What Would Be Done Differently + +- Write the plan with the parser-vs-instruction tradeoff already considered. The iter-1 plan jumped to a parser because the spec used "baked decisions" language that felt structured; in retrospect, the LLM-driven approach was the obvious choice given Codev's prompt-driven posture. +- Specify mirror-parity scope explicitly in the plan ("section-level parity, not file-level diff clean") to avoid the recurring Codex flag. +- Consider committing programmatic smoke tests from Phase 1 rather than relying on a "manual smoke" deliverable in a later phase. The smoke value is realized once and then runs forever for free. + +### Methodology Improvements + +- **For prompt-driven features**: default to LLM-driven recognition. Only introduce a parser if the LLM provably cannot do the job (e.g., security-sensitive parsing, structured machine-readable contract). The architect's iter-3 feedback codifies this as a general principle. +- **For plan acceptance criteria involving mirrored trees**: scope to "the new content this phase introduces is byte-identical between codev/ and skeleton" — never use "diff clean" without qualification, since pre-existing divergence is the norm. +- **For "manual smoke" deliverables**: convert to programmatic tests up-front. There is rarely a good reason to keep something as manual when `renderTemplate` or equivalent is accessible from the test runner. + +## Architecture Updates + +No architecture updates needed. This is a prompt-and-documentation change with no new subsystems, no new data flows, no new modules, and no new files in `codev/resources/arch.md`'s domain. The existing `{{issue.body}}` template variable continues to carry the issue body verbatim — no change to the spawn pipeline. + +## Lessons Learned Updates + +Added three new lessons to `codev/resources/lessons-learned.md` under the appropriate sections: + +1. **Process / Plan Authoring**: *"Default to LLM-driven recognition over regex parsing for features whose discipline is enforced by prompts. Only add a parser when the LLM provably cannot do the job (e.g., security parsing, structured machine contracts). Adding code surface for what is fundamentally a prompt-discipline question is over-engineering."* + +2. **Process / Plan Authoring**: *"Plan acceptance criteria involving mirrored trees (codev/ ↔ codev-skeleton/) should scope to 'the new content is byte-identical' rather than 'diff clean'. Pre-existing structural divergence between the two trees is the norm; whole-file parity criteria force phases to either conflate unrelated cleanup or generate rebuttal traffic."* + +3. **Process / Testing**: *"Convert 'manual smoke' deliverables to programmatic tests at design time. If `renderTemplate` (or equivalent) is accessible from the test runner, running the smoke automatically is strictly stronger than a one-time manual check and adds zero ongoing maintenance cost."* + +## Consultation Feedback + +### Specify Phase (Round 1) — 2026-05-14 + +#### Gemini — REQUEST_CHANGES +- **Concern**: Unresolved scope (SPIR only vs all three protocols), unresolved template location. + - **Addressed**: Resolved Decisions #1 and #2 added — applies to SPIR + AIR + ASPIR; templates land in both Codev and skeleton. +- **Concern**: Heading-level robustness (`##` vs `###`) not addressed. + - **Addressed**: Resolved Decision #3 — matching is heading-level-agnostic + case-insensitive; test scenarios + risk row added. +- **Concern**: Plan-review consistency — existing "don't re-litigate" line too generic. + - **Addressed**: Resolved Decision #9 — explicit anti-relitigation language added to plan-review.md (and AIR equivalents) in Success Criteria. + +#### Codex — REQUEST_CHANGES +- **Concern**: Scope still unresolved in spec text. + - **Addressed**: Same as Gemini above — explicit decision #1. +- **Concern**: Acceptance criteria non-deterministic ("CMAP feedback does not relitigate it"). + - **Addressed**: Rewrote every Success Criterion as a concrete grep / snapshot / file-existence check with explicit "Pass:" signals. +- **Concern**: Minimum contract for issue-body format undefined. + - **Addressed**: Resolved Decisions #3-#5 — heading text "Baked Decisions" (case-insensitive, any level); empty / missing / placeholder-only = no-op. +- **Concern**: Edge cases incomplete (intra-section contradictions, conflict with prose). + - **Addressed**: Resolved Decisions #6, #7, #8 — explicit rules added; Test Scenarios 6, 7 added. + +#### Claude — COMMENT +- **Concern**: `prompts/specify.md` (SPIR + ASPIR) missing from Dependencies — these drive spec drafting, distinct from builder-prompt. + - **Addressed**: Added to Dependencies; explicit Success Criteria for each. +- **Concern**: First "Critical" open question already answered by spec content. + - **Addressed**: Resolved in Resolved Decisions; removed from open questions. +- **Concern**: codev-skeleton/ as explicit success criterion. + - **Addressed**: Added. +- **Concern**: Clarify AIR has no `spec-review.md`. + - **Addressed**: Resolved Decision #10 — AIR touchpoints enumerated. + +### Specify Phase (Architect Feedback at spec-approval gate) — 2026-05-17 + +- **Feedback**: Drop the `.github/ISSUE_TEMPLATE/` scope entirely. Codev is CLI-driven; templates only fire for GitHub UI filing. + - **Addressed**: Resolved Decision #2 rewritten; documentation paragraph in each protocol.md becomes the discoverability surface (Decision #11). +- **Feedback**: Tighten the end-to-end transcript criterion to a concrete snapshot diff. + - **Addressed**: Reworded the criterion to "snapshot diff (with vs without Baked Decisions section) consists exclusively of the new block." +- **Feedback**: Add architect-override carveout framing to all prompt language. + - **Addressed**: Resolved Decision #12 added; every prompt clause uses "do not autonomously …" framing. + +### Plan Phase (Round 1) — 2026-05-17 + +#### Gemini — APPROVE +- No concerns raised. Noted minor observation about `spawn.ts` being the actual context-construction site (also flagged by Codex and Claude). + +#### Codex — REQUEST_CHANGES +- **Concern**: Phase 1 understated the code surface (`spawn.ts` missing). + - **Addressed**: Phase 1 "Files touched" expanded to include `spawn.ts` as a read-only verification touchpoint with fallback edit clause. (Moot after iter-3 architect rewrite dropped the parser entirely.) +- **Concern**: Phase 5 missing no-regression coverage for consult-type prompts. + - **Addressed**: Phase 5 deliverables expanded to cover 12 static files with pure-addition diff. (Folded into Phase 4 after iter-3 rewrite.) +- **Concern**: Contradiction handling (spec Decision #7) under-specified in plan. + - **Addressed**: Explicit contradiction clause text added to Phase 2 and Phase 3, plus grep tests for `contradict` + `pause` + `flag`. + +#### Claude — APPROVE +- Same `spawn.ts` observation as Gemini and Codex (non-blocking). +- Noted that `computeDiff()` in illustrative code isn't a standard vitest utility — addressed inline by noting the `diff` npm package or hand-rolled options. + +### Plan Phase (Architect Feedback at plan-approval gate) — 2026-05-17 + +- **Feedback**: Drop the parser entirely. Use prompt instructions. + - **Addressed**: Entire plan rewritten (iter-3). Parser, `TemplateContext` field, and `{{#if baked_decisions}}` block removed. 5 phases → 4 phases. 15 baselines → 12 baselines. See "Deviations from Plan" above. + +### Implement Phase 1 (Builder-prompts) — 2026-05-17 + +#### Gemini — APPROVE +- No concerns. + +#### Codex — REQUEST_CHANGES +- **Concern**: Codev/ and skeleton builder-prompts still differ beyond the new paragraph. + - **Rebutted**: Pre-existing divergence (skeleton has Multi-PR / Verify sections that codev/ doesn't) is out of this work's scope. Codex/Gemini reading of "diff clean" is over-strict. +- **Concern**: Tests only check pure-addition against codev/ baselines; never assert codev/ ↔ codev-skeleton/ parity. + - **Addressed**: Added a focused parity test that extracts the `## Baked Decisions` section from each builder-prompt and asserts codev/ + skeleton sections are byte-identical. Doesn't fail on pre-existing divergence; does catch drift in the paragraph this work owns. + +#### Claude — APPROVE +- Independently validated the pre-existing-divergence interpretation. Noted the pollution check could be extended beyond SPIR — non-blocking; pure-addition diff guards ASPIR/AIR implicitly. + +### Implement Phase 2 (Drafting prompts) — 2026-05-17 + +#### Gemini — APPROVE +- No concerns. + +#### Codex — APPROVE +- No concerns. + +#### Claude — APPROVE +- Noted the `### 0.5 Baked Decisions` numbering choice as a creative way to fit the existing flow. Non-blocking. + +### Implement Phase 3 (Reviewer prompts) — 2026-05-17 + +#### Gemini — APPROVE +- No concerns. + +#### Codex — REQUEST_CHANGES +- **Concern**: Phase 3 grep test for COMMENT/REQUEST_CHANGES is file-scoped, but those tokens already exist in the pre-existing `## Verdict Format` section. A regression that loses the distinction from the new paragraph would silently pass. + - **Addressed**: Lifted `extractBakedSection` helper to the top of the Phase 3 describe block; rewrote per-file blocks to assert against the extracted section, not the whole file. Hypothetical regression now fails loudly. + +#### Claude — APPROVE +- Noted the same test had 48 grep regression tests without flagging the scoping weakness — Codex caught the subtler case. Acknowledged in rebuttal. + +### Implement Phase 4 (Docs + final sweep) — 2026-05-17 + +#### Gemini — APPROVE +- No concerns. + +#### Codex — REQUEST_CHANGES +- **Concern**: Manual smoke not evidenced — only static regression checks, no artifact showing the spawn-and-render check was performed. + - **Addressed**: Converted the manual smoke into a programmatic end-to-end test (`Spec 746 end-to-end smoke` describe block). For each of SPIR/ASPIR/AIR builder-prompts, renders against fixture issue bodies (with + without baked decisions) and asserts the rendered prompt contains both the instruction paragraph and the issue's baked-decisions content. Strictly stronger than one-time manual check. +- **Concern**: Skeleton-parity check is section-only, not broader codev/skeleton diff. + - **Rebutted**: Same grounds as Phase 1 — pre-existing divergence is out of scope; section-level parity is what Phase 4 owns and matches the actual obligation. + +#### Claude — APPROVE +- Noted one cosmetic leftover comment in the inventory list (stream-of-consciousness text). Cleaned up in the same commit as the smoke conversion. + +## Flaky Tests + +No flaky tests encountered or introduced. The full pre-existing test suite (2,632 tests across 130 files) plus this work's 193 new tests all pass deterministically. + +## Follow-up Items + +- **Deferred (Nice-to-Know in spec)**: Consider whether `afx spawn` should warn at spawn time if it detects a `## Baked Decisions` header in the issue body but the section is empty. Out of scope for this PR — pure prompt/docs change kept the surface tight. Could be a future bugfix or TICK if architects request it. +- **Deferred**: Section-name aliasing (e.g., "Architectural Givens" or "Constraints (fixed)") is intentionally out of scope. The spec requires the literal "Baked Decisions" name. If usage shows architects gravitating to other names, the prompts could be updated to recognize aliases. +- **Worth tracking**: How often do architects actually use the section? Future MAINTAIN run could audit recent issues to see uptake and inform a v2 (e.g., issue template prompt if discoverability proves insufficient). diff --git a/codev/specs/746-spir-architect-s-baked-archite.md b/codev/specs/746-spir-architect-s-baked-archite.md new file mode 100644 index 000000000..f2c4ae4e9 --- /dev/null +++ b/codev/specs/746-spir-architect-s-baked-archite.md @@ -0,0 +1,375 @@ +# Specification: Baked Architectural Decisions in SPIR Issue Body + +--- +approved: 2026-05-17 +validated: [gemini, codex, claude] +--- + +## Metadata +- **ID**: spec-2026-05-14-baked-decisions +- **Status**: approved +- **Created**: 2026-05-14 +- **Last Updated**: 2026-05-17 (iter-4: spec text amended to match the iter-3 plan-approval direction toward unconditional rendering — see Amendments below) +- **GitHub Issue**: #746 + +## Clarifying Questions Asked + +Issue #746 is a well-scoped feature request from the Shannon architect, filed 2026-05-14 with a concrete failure case (Spec 1353 Persona harness) and two candidate solution shapes (Option A: optional issue-template section; Option B: pre-spec checklist). Because the issue already articulates problem, cost, and design options, no additional clarifying questions were posed to the user before drafting this spec. Questions surfaced during drafting are tracked in **Resolved Decisions** and **Open Questions** below. + +## Problem Statement + +When an architect files a SPIR (or AIR / ASPIR) issue and already has a **strong prior** on a major architectural decision — language, framework, deployment shape, protocol choice, dependency boundary — that prior is currently invisible to the builder and the CMAP reviewers (Codex / Gemini / Claude). The builder drafts the spec against an *assumed* default. CMAP reviews that spec on its merits. By the time the architect intervenes ("actually, use Python, not Node"), the spec has been through one or two consultation rounds against the wrong assumption, and the iter-2 reviewer feedback is obsolete the moment the assumption flips. + +**Concrete failure**: Shannon Spec 1353 (Persona harness), 2026-05-14: +- iter-1: spec drafted assuming Node design (default) +- iter-2: drop daemon, per CMAP +- iter-3: architect intervenes — "use Python, match `shanutil`" (major reset) +- iter-4: CMAP polish + +Cost: ~45 min of churn rewriting the spec, plus Codex's iter-2 feedback became wrong the moment the language switched. + +The root cause is not bad CMAP feedback; it is a **missing input channel** for the architect's pre-spec convictions. The architect's strong priors are real data that the builder and reviewers need at iter-1, not at iter-3. + +## Current State + +Today, when an architect spawns a SPIR/AIR/ASPIR builder: +1. The builder receives the issue body verbatim in the builder-prompt template. +2. The builder reads the issue, drafts a spec, and runs CMAP. +3. CMAP reviews the spec on its technical merits — including questioning language, framework, and protocol choices the architect already considers settled. +4. If the architect was watching, they intervene mid-cycle to override the assumption, forcing a rewrite. +5. If the architect was not watching, the spec converges on the wrong shape and is rejected at the spec-approval gate, also forcing a rewrite. + +There is no structured slot in the issue body for **"these decisions are fixed, do not relitigate"**. Architects who want to communicate priors do so in prose — easy to miss, easy for reviewers to override in good faith, easy for builders to treat as one option among several. + +The `spec-review.md` and `plan-review.md` consult-type prompts (used by Codex / Gemini / Claude during CMAP) give reviewers a generic mandate to evaluate completeness, correctness, feasibility, and clarity. `plan-review.md` already says *"don't re-litigate spec decisions"*, which means baked decisions *will* be honored at plan-time *iff* they were faithfully written into the approved spec's Constraints section. The remaining gap is at spec-review (where there is no anti-relitigation instruction at all) and at the moment of initial spec drafting (where the specify prompt does not tell the builder to treat the section as fixed). + +## Desired State + +Architects have a **structured, optional channel** in the issue body to declare baked architectural decisions. When present: +1. The builder treats those decisions as **fixed inputs** to the spec — not options to explore. +2. CMAP reviewers (at spec-review **and** plan-review) are explicitly instructed to **not relitigate** the listed decisions; their job is to review the spec/plan *given* those constraints. +3. The spec's "Constraints" section incorporates the baked decisions verbatim, so they remain visible through the full spec/plan/implement lifecycle. +4. AIR builders (which skip the spec phase) honor the baked decisions directly via their builder-prompt and `impl-review.md` consult-type. + +When absent (the section is left blank or omitted), behavior is unchanged from today: the spec explores tradeoffs freely. + +The expected outcome on the Shannon 1353 failure mode: if the architect had listed "Language: Python (match `shanutil`)" as a baked decision in the issue body, iter-1 would have drafted in Python, iter-2 CMAP would have left the language alone, and the 45-min reset would not have happened. + +## Stakeholders + +- **Primary Users**: Architects filing SPIR / AIR / ASPIR issues. They are the ones with the strong priors and the ones who pay the cost of relitigation. +- **Secondary Users**: Builders (autonomous AI agents) and CMAP reviewers (Codex / Gemini / Claude). They consume the baked decisions and must honor them. +- **Technical Team**: Codev maintainers. They own the issue templates, builder-prompts, prompt files, and consult-type prompts that this spec touches. +- **Business Owners**: Codev project — Waleed Kadous. + +## Resolved Decisions + +The following decisions were raised during drafting and CMAP / architect review and are now considered settled in this spec: + +1. **Scope: all three protocols.** SPIR, AIR, and ASPIR all suffer the same failure mode and must all honor baked decisions. ASPIR is identical to SPIR except for gates; it shares the same prompt assets. AIR skips the spec phase but its implement and PR review prompts still need to honor baked decisions surfaced through the issue body. + +2. **No GitHub issue template.** (Revised in iter-3 per architect feedback.) Codev is CLI-driven; `.github/ISSUE_TEMPLATE/` only fires for issues filed via the GitHub web UI. Architects who file via `gh issue create --body-file` or via the API would bypass the template entirely. Templates also add a maintenance surface (Codev mirror + `codev-skeleton/` mirror + downstream inheritance) that pays for itself only if the UI is the dominant filing path — which it is not. The correctness work — prompt-level honoring of the section by builders and CMAP reviewers — is what actually matters. **Discoverability** for architects is achieved instead via a documentation paragraph in each protocol's `protocol.md` (see Decision #9). Architects with strong priors include a `## Baked Decisions` section in the issue body by convention; the prompts honor it whether it arrived via UI or CLI. + +3. **Section heading format: heading-level-agnostic match on the name "Baked Decisions".** Prompts and instructions look for a section *named* "Baked Decisions" (case-insensitive), not for an exact `##` heading level. Real-world issue bodies render at varying heading levels (`##`, `###`); the match must tolerate that. + +4. **Section identity = literal heading string.** The contract is: a heading whose text is "Baked Decisions" (any leading `#`s, any case) opens the section; the section ends at the next heading of equal-or-lesser depth or end of issue. No nested machine schema — content is free-form markdown. + +5. **Empty section = no-op.** A section that is missing, present-but-empty, or contains only the placeholder text (the comment block from the template) is treated as absent. Behavior matches today's default — full exploration. + +6. **Conflict between baked decisions and other issue prose**: baked decisions win. If the issue body says "we should consider both Node and Python" in prose and the baked section says "Python", the baked section is authoritative. + +7. **Conflict within the baked decisions themselves** (e.g., two contradictory bullet points): builder must flag the contradiction to the architect (via `afx send architect`) and pause rather than guess. Reviewer prompts should also flag, not silently pick a winner. + +8. **Conflict between a baked decision and the drafted spec**: reviewer flags the contradiction as a `REQUEST_CHANGES` against the *spec* (it failed to honor the constraint), not as an attempt to relitigate the decision. + +9. **plan-review extension**: explicitly add an anti-relitigation instruction to `plan-review.md` mirroring the spec-review wording. The existing "don't re-litigate spec decisions" line is too generic; we want it explicit that baked decisions from the issue body are still off-limits even if the plan would benefit from changing them. + +10. **AIR coverage**: AIR has no `spec-review.md` (it skips the spec phase). For AIR, the touchpoints are its `builder-prompt.md`, `prompts/implement.md`, and `consult-types/impl-review.md` + `consult-types/pr-review.md`. The instruction in AIR's prompts is "honor baked decisions from the issue body." + +11. **Discoverability via documentation, not templates.** Each affected `protocol.md` (SPIR, ASPIR, AIR) gets a short paragraph: *"If you have strong priors on language / framework / deployment / dependencies, include a `## Baked Decisions` section in the issue body. The builder and CMAP reviewers will treat its contents as fixed and will not re-litigate them."* Same change mirrored to `codev-skeleton/protocols/*/protocol.md`. This is the entire discoverability surface — no template ceremony. + +12. **Architect-override carveout in all prompt language.** Prompt rules that constrain the builder/reviewer behavior around baked decisions must be framed as *"do not autonomously override a baked decision"*, not *"baked decisions are forbidden to question"*. The architect can always rescind or amend a baked decision in a follow-up message; the rule guards against silent autonomous drift, not against human revision. Every prompt addition this spec drives must include this carveout in spirit (and in the literal phrasing where reasonable). + +## Success Criteria + +Each criterion has a concrete pass/fail signal so a builder can verify it without ambiguity. **All criteria are prompt-and-documentation changes** — no issue templates, no CLI changes (see Resolved Decision #2). + +- [ ] **SPIR builder-prompt** carries a top-level `## Baked Decisions` instruction paragraph that teaches the builder the convention. (Amended in iter-4: per the iter-3 plan-approval direction, the paragraph is **unconditional** — present in every rendered builder-prompt regardless of whether the issue body contains a Baked Decisions section. The paragraph is a no-op when no section is present; it educates the builder when one is. See Amendments below.) Pass: rendering the template against any issue produces a top-level `## Baked Decisions` block; when the issue itself contains a Baked Decisions section, that content reaches the builder verbatim via `{{issue.body}}`. +- [ ] **ASPIR builder-prompt** behaves identically to SPIR's. Pass: same rendering test against ASPIR's template. +- [ ] **AIR builder-prompt** carries the same instruction paragraph. Pass: same rendering test against AIR's template. +- [ ] **SPIR `prompts/specify.md`** instructs the builder to read the baked-decisions section first and to write its content verbatim into the spec's Constraints section. Pass: grep the file for an explicit clause referencing "Baked Decisions" and Constraints. +- [ ] **ASPIR `prompts/specify.md`** has the same clause. Pass: grep. +- [ ] **AIR `prompts/implement.md`** has an analogous "honor baked decisions from the issue body" clause. Pass: grep. +- [ ] **SPIR `consult-types/spec-review.md`** contains a "do not autonomously override baked decisions" instruction (carveout phrasing per Decision #12). Pass: grep for explicit phrasing covering the case where the spec respects a baked decision (reviewer should not push back on the underlying choice; only flag if the spec fails to honor the decision). +- [ ] **ASPIR `consult-types/spec-review.md`** has the same instruction. Pass: grep. +- [ ] **SPIR `consult-types/plan-review.md`** extends its existing anti-relitigation language to explicitly cover baked decisions. Pass: grep for explicit "baked decisions" language. +- [ ] **ASPIR `consult-types/plan-review.md`** has the same explicit phrasing. Pass: grep. +- [ ] **AIR `consult-types/impl-review.md`** has an analogous instruction. Pass: grep. +- [ ] **AIR `consult-types/pr-review.md`** has an analogous instruction. Pass: grep. +- [ ] **Documentation** — `codev/protocols/spir/protocol.md`, `codev/protocols/aspir/protocol.md`, and `codev/protocols/air/protocol.md` each contain a paragraph instructing architects how to declare baked decisions in the issue body. Pass: grep for "Baked Decisions" in each protocol.md; manual read confirms the paragraph explains the convention, category hints (language / framework / deployment / dependencies), and the "no relitigation by default" behavior. +- [ ] **Skeleton mirror** — every file modified in `codev/protocols/` has the identical edit applied to its mirror in `codev-skeleton/protocols/`. Pass: `diff -r codev/protocols/ codev-skeleton/protocols/` for the touched files shows no substantive differences (other than path-string differences that already exist). +- [ ] **End-to-end smoke (with-vs-without rendering)** — for each of the three builder-prompts (SPIR, ASPIR, AIR), render the template twice against fixture issues: once with a `## Baked Decisions` section and once without. (Amended in iter-4: per the iter-3 unconditional-instruction design, both renders contain the instruction paragraph; the difference is only in the `{{issue.body}}` content, which carries the issue's own Baked Decisions section through verbatim when present.) Pass: both renders contain the top-level instruction `## Baked Decisions` block; the with-fixture render additionally contains the fixture's Baked Decisions content verbatim; the without-fixture render contains no fixture content. +- [ ] **No regression** — every static markdown file touched by this work (builder-prompts, drafting prompts, reviewer prompts, protocol.md) has a pre-change baseline captured; the post-change file is a pure-addition diff of the baseline (zero removed lines, zero modified lines). This is how no-regression maps to the architect-directed unconditional design: we no longer need a "no `## Baked Decisions` block when absent" assertion (that requirement only applied to the parser-based design); instead, we assert that nothing pre-existing was removed or mangled in any of the 30 touched files. + +## Constraints + +### Technical Constraints +- Issue body is the canonical input channel for AIR / BUGFIX / SPIR / ASPIR — anything we add must live in the rendered issue body (or in an equally durable channel that flows through `afx spawn`'s `--issue` path). +- Changes must be backward compatible: existing issues without the section must work unchanged. +- The mechanism must work regardless of how the issue was filed (GitHub UI, `gh issue create --body-file`, API). The section is plain markdown convention — discoverability comes from protocol documentation, not from GitHub templates (see Resolved Decision #2). +- Section name matching must be **heading-level-agnostic** (`##`, `###`, etc.) and case-insensitive on the text "Baked Decisions". Builder-prompt and consult-type prompt phrasing must not lock to a specific heading level. +- Builder-prompt and consult-type prompts are rendered Handlebars-style templates — additions must respect that toolchain. +- The protocol is meant to apply to SPIR, ASPIR, and AIR (not BUGFIX, which is too small for architectural priors). +- Prompt language must use the **architect-override carveout** framing (Resolved Decision #12): "do not autonomously override / relitigate baked decisions" rather than absolute prohibitions. The architect can always rescind. + +### Business Constraints +- This is a tier-2 priority per Shannon's note — design carefully rather than rush. +- Must not add friction for the common case (no baked decisions). Optional-by-default is non-negotiable. + +## Assumptions +- The issue body is the right surface for declaring baked decisions (vs. a separate file or a CLI flag). A documentation-only convention is sufficient because the Codev workflow is CLI-driven and architects file issues directly. +- Builders and CMAP reviewers will reliably honor an explicit instruction in their prompts to treat a section as fixed — i.e., we trust the prompt channel more than we trust prose conventions. +- Architects who don't have strong priors will simply omit the section; absence is the no-op default. +- The audience for "baked decisions" is **the spec drafter and CMAP reviewers** — not downstream consumers. We do not need a separate API or machine-readable schema. +- Documentation discoverability (a paragraph in each `protocol.md`) is sufficient — architects learn the convention by reading the protocol they are about to invoke. + +## Solution Approaches + +### Approach 1: Issue-Template + Reviewer Prompt Update (rejected in iter-3) +**Description**: Add an optional `## Baked Decisions` section to GitHub issue template(s) for SPIR / AIR / ASPIR plus the prompt edits. + +**Pros**: +- Discoverability — architects filing via the GitHub UI see the section as a prompt. +- Single source of truth (the issue body). + +**Cons** (decisive): +- GitHub issue templates only fire when filing via the web UI. Codev is CLI-driven (`gh issue create --body-file`, scripted issue filing, API integrations) and most issues are filed without ever touching the template. +- Maintenance surface: a template in `codev/.github/ISSUE_TEMPLATE/`, a mirror in `codev-skeleton/.github/ISSUE_TEMPLATE/`, downstream projects inheriting it on `codev init`. Each new mirror is a synchronization burden. +- Placeholder text in the rendered issue is noise when the architect has no baked decisions. + +**Estimated Complexity**: Medium (mostly mirror-management) +**Risk Level**: Low +**Decision**: Rejected by architect in iter-3. Discoverability is better served by documentation in each `protocol.md` — architects read those when invoking the protocol. + +### Approach 2: Pre-Spec Checklist Template (Option B from the issue) +**Description**: A separate one-pager template (e.g., `codev/templates/pre-spec.md`) that architects fill before filing the issue. The filled checklist is pasted verbatim into the issue body. + +**Pros**: +- More rigorous — checklist forces the architect to consider each category. +- Useful as a thinking tool even when most fields are "TBD." + +**Cons**: +- More ceremony — friction on every issue, not just the ones with baked decisions. +- Two-step workflow (fill template → paste into issue) is awkward. +- For issues with no baked decisions, the checklist is dead weight. + +**Estimated Complexity**: Medium +**Risk Level**: Medium (adoption risk — architects skip it under pressure) +**Decision**: Not chosen. + +### Approach 3: Prompt-Level Honoring + Protocol Documentation (Selected) +**Description**: Pure prompt-and-documentation change. +- Builder-prompts (SPIR / ASPIR / AIR) surface a `## Baked Decisions` block in the rendered prompt when the issue body contains a section with that name (heading-level-agnostic, case-insensitive). +- `prompts/specify.md` (SPIR / ASPIR) instructs the builder to write the section verbatim into the spec's Constraints section. +- `prompts/implement.md` (AIR) instructs the builder to treat the section as fixed during implementation. +- `consult-types/spec-review.md`, `plan-review.md`, `impl-review.md`, `pr-review.md` instruct reviewers to honor baked decisions and not autonomously override them. +- Each `protocol.md` (SPIR, ASPIR, AIR) gets a short paragraph documenting the convention and category hints (language / framework / deployment / dependencies). +- Mirror everything into `codev-skeleton/`. + +**Pros**: +- Zero ceremony when not used — absence of the section is the no-op default. +- Single source of truth for the *mechanism* (prompts) and a single source of truth for *discoverability* (protocol docs). +- No GitHub-UI dependency — works for issues filed via CLI or API. +- Smallest maintenance surface that achieves the goal. +- Architect can amend or rescind a baked decision at any time (carveout framing per Decision #12). + +**Cons**: +- Discoverability depends on architects reading the protocol doc — they have to learn the convention, not be prompted by template scaffolding. +- No machine-enforced schema — architects can write fuzzy or contradictory entries; reviewer prompts handle this by instructing to flag-and-pause. + +**Estimated Complexity**: Low +**Risk Level**: Low + +**Recommendation**: Approach 3. + +## Open Questions + +### Critical (Blocks Progress) + +*(None remaining — scope and template location resolved above under Resolved Decisions.)* + +### Important (Affects Design) + +- [ ] **Should `afx spawn` warn at spawn time** if it detects "Baked Decisions" header in the issue but the section is empty? Lean: out of scope for this spec — keep the spec-side change pure prompt + documentation. + +### Nice-to-Know (Optimization) +- [ ] Should the spec template (`codev/protocols/spir/templates/spec.md`) explicitly cross-reference baked decisions in its Constraints section header? +- [ ] Is there value in tooling that lints the baked-decisions section for common categories (language / framework / deployment) before spawning a builder? +- [ ] Should `consult` output flag if a reviewer's feedback contradicts a baked decision, so it can be visibly down-weighted in CMAP synthesis? + +## Performance Requirements + +Not applicable — this is a documentation / prompt-template change. No runtime or service-level performance concerns. + +## Security Considerations + +- No new authentication or authorization surface. +- The baked-decisions section is plain markdown inside the issue body — same trust boundary as today's issue content. +- One mild concern: a baked decision that includes a path or dependency name will flow verbatim into the builder-prompt and the CMAP reviewer prompts. This is the same trust posture as the rest of the issue body, so no new exposure. + +## Test Scenarios + +### Functional Tests + +1. **Baked-decisions present (happy path)** + - Fixture issue body includes `## Baked Decisions` with "Language: Python, Framework: FastAPI." + - Render the SPIR builder-prompt. + - Assertion: rendered prompt contains a dedicated `## Baked Decisions` block at the top level (not just embedded inside `{{issue.body}}`). + - Render the spec-review consult-type prompt with a fixture spec that respects the constraint. + - Assertion: rendered prompt contains the anti-relitigation instruction text verbatim. + +2. **Baked-decisions absent (omitted) — snapshot diff** + - Render the SPIR builder-prompt twice against the same fixture issue: once with a `## Baked Decisions` section, once without. + - Assertion: the diff between the two outputs is non-empty and consists exclusively of the new `## Baked Decisions` block. No other lines change. + - Snapshot test: the "without" render is byte-identical to a baseline recorded against today's templates. + - Repeat for ASPIR and AIR. + +3. **Baked-decisions partial** + - Fixture issue body lists only language (Python) but no framework. + - Render builder-prompt. + - Assertion: language appears verbatim; no framework constraint is fabricated. + +4. **Heading-level variation** + - Three fixtures: `## Baked Decisions`, `### Baked Decisions`, `# Baked Decisions`. + - Render builder-prompt for each. + - Assertion: section is recognized in all three cases; rendered prompt surfaces the content correctly. + +5. **Case insensitivity** + - Fixture: `## baked decisions` (lowercase). + - Assertion: section recognized and content surfaced. + +6. **Contradictory entries within baked decisions** + - Fixture: section contains "Language: Python" AND "Language: Node.js". + - Render builder-prompt and consult-type prompts. + - Assertion: both prompts contain instructions telling the builder/reviewer to flag the contradiction and pause, not silently pick. + +7. **Conflict between baked decision and issue prose** + - Fixture: prose says "consider Node and Python", baked says "Python". + - Manual / transcript test: builder treats Python as fixed, prose as superseded. + +8. **Plan-review honors baked decisions** + - Fixture: spec with a Constraints section listing the baked decisions; plan that respects them. + - Render plan-review prompt. + - Assertion: prompt contains the anti-relitigation instruction language. + +9. **AIR impl-review honors baked decisions** + - Fixture: AIR issue with baked decisions; implementation respecting them. + - Render impl-review prompt. + - Assertion: anti-relitigation instruction present. + +### Non-Functional Tests + +- **No regression**: Existing SPIR / AIR / ASPIR projects without baked decisions complete as they do today. CMAP iteration counts on a representative set of recent issues do not increase. (Measurable by re-running CMAP on a previously-completed issue and checking the new feedback against the historical feedback.) + +## Dependencies + +- **External Services**: None. +- **Internal Systems** (every file in this list is a touchpoint that must be reviewed and most must be edited): + - `codev/protocols/spir/builder-prompt.md` + - `codev/protocols/aspir/builder-prompt.md` + - `codev/protocols/air/builder-prompt.md` + - `codev/protocols/spir/prompts/specify.md` + - `codev/protocols/aspir/prompts/specify.md` + - `codev/protocols/air/prompts/implement.md` + - `codev/protocols/spir/consult-types/spec-review.md` + - `codev/protocols/aspir/consult-types/spec-review.md` + - `codev/protocols/spir/consult-types/plan-review.md` + - `codev/protocols/aspir/consult-types/plan-review.md` + - `codev/protocols/air/consult-types/impl-review.md` + - `codev/protocols/air/consult-types/pr-review.md` + - `codev/protocols/spir/protocol.md` (documentation paragraph — primary discoverability surface) + - `codev/protocols/aspir/protocol.md` (documentation paragraph) + - `codev/protocols/air/protocol.md` (documentation paragraph) + - `codev-skeleton/` mirror copies of every file above +- **Explicitly NOT in scope**: `.github/ISSUE_TEMPLATE/` (rejected in iter-3 — see Resolved Decision #2). +- **Libraries/Frameworks**: None new. Existing Handlebars-style prompt rendering is sufficient. + +## References + +- Issue #746 (this spec's source) +- Shannon Spec 1353 (Persona harness) — the concrete failure case that motivated the issue +- `codev/protocols/spir/protocol.md` — SPIR protocol +- `codev/protocols/spir/consult-types/spec-review.md`, `plan-review.md` — CMAP reviewer prompts +- `codev/protocols/spir/builder-prompt.md` — Builder system prompt +- `codev-skeleton/protocols/...` — Mirror copies shipped to downstream projects + +## Risks and Mitigation + +| Risk | Probability | Impact | Mitigation Strategy | +|------|-------------|--------|--------------------| +| Architects don't discover the convention | Medium | Medium | Documentation paragraph in each `protocol.md` is the discoverability surface; protocol docs are the first thing an architect reads when invoking a protocol. Future MAINTAIN can audit usage and surface examples. | +| Architects forget to use the section, reverting to status quo | Medium | Low | Same docs paragraph reminds them; CMAP iteration cost is its own incentive — architects who feel the pain of relitigation will adopt. | +| Builders / CMAP reviewers ignore the prompt instruction | Low–Medium | High | Explicit dedicated section in the rendered prompt; reviewer prompt repeats the instruction verbatim; phrasing puts the constraint at the top of the relevant section. | +| Baked decisions are wrong or premature | Medium | Medium | Architects can amend the issue and respawn; document this escape hatch in the protocol. The spec-approval gate is still the human checkpoint. Carveout framing (Decision #12) makes clear the architect can rescind. | +| Conflict between baked decisions and CMAP best-practice advice | Medium | Low | Reviewer prompt tells reviewers to flag concerns about a baked decision as a `COMMENT`, not as `REQUEST_CHANGES` — the architect makes the final call. | +| Heading-level mismatch (`##` vs `###` vs `#`) silently breaks recognition | Medium | High | Prompts instruct readers to match the section by *name*, not by heading level; success criteria require explicit fixtures covering all three levels. | +| Contradictory baked decisions cause silent failure | Low | Medium | Builder and reviewer prompts both instruct to flag and pause rather than guess. | +| Prompt language overshoots into absolute prohibition | Low | Medium | Decision #12 mandates "do not autonomously override" framing; reviewer of the implementation PR should verify this carveout is present in every prompt addition. | + +## Expert Consultation + +**Iteration 1 — 2026-05-14**: Reviewed by Gemini, Codex, Claude. Verdicts: Gemini `REQUEST_CHANGES`, Codex `REQUEST_CHANGES`, Claude `COMMENT`. + +Key consolidated feedback addressed in iter-2: + +- **Resolved scope** to SPIR + AIR + ASPIR explicitly (was a critical open question in iter-1). +- **Added heading-level-agnostic matching** to constraints and test scenarios (Gemini — real-world issues render at varying levels). +- **Added `prompts/specify.md` (SPIR + ASPIR) and `prompts/implement.md` (AIR) to Dependencies** (Claude — these are the prompts that actually drive spec drafting, distinct from builder-prompt). +- **Added explicit plan-review.md and AIR impl/pr-review.md changes** to Success Criteria (Gemini — existing "don't re-litigate" line is too generic to close the loophole). +- **Made Success Criteria deterministic** — every criterion now has a concrete pass signal (Codex). +- **Defined section-recognition contract**: heading text "Baked Decisions" (case-insensitive, any level), empty = no-op, with explicit rules for contradictions and conflicts with prose. +- **Clarified AIR has no `spec-review.md`** — the AIR touchpoints are builder-prompt + implement.md + impl-review.md + pr-review.md. + +**Architect Feedback — 2026-05-17** (post iter-2 spec-approval gate): + +- **Dropped `.github/ISSUE_TEMPLATE/` scope entirely.** Codev is CLI-driven; templates only fire for GitHub UI filing and add maintenance surface (codev/ + skeleton/ mirrors + downstream inheritance) for discoverability the CLI workflow doesn't need. Resolved Decision #2 rewritten; Success Criteria (issue template + skeleton template), Constraints, Test Scenario "Issue filed via CLI", and Risks rows trimmed. +- **Replaced with documentation.** Each `protocol.md` (SPIR / ASPIR / AIR) gets a discoverability paragraph. Resolved Decision #11 added; Success Criteria for documentation tightened to require category hints and the no-relitigation behavior to be explained. +- **Dropped the "one generic template vs per-protocol" open question** as moot. +- **Tightened the end-to-end transcript success criterion** to a concrete snapshot diff (with-section vs without-section render of each builder-prompt; the diff must consist exclusively of the new `## Baked Decisions` block). +- **Added Resolved Decision #12** (architect-override carveout) per the memory rule that prompt constraints on builders should be framed as "don't autonomously X" rather than "X is forbidden." All prompt edits this spec drives must use that framing; reviewer of the implementation PR should verify. + +## Approval +- [ ] Technical Lead Review +- [ ] Product Owner Review +- [ ] Stakeholder Sign-off +- [ ] Expert AI Consultation Complete + +## Notes + +- This spec deliberately stays at the WHAT level. The HOW — exact phrasing of the reviewer-prompt additions, exact documentation paragraph wording, the order in which files are edited — belongs in the plan. +- The Shannon failure case (Spec 1353) is the canonical example; the plan should include it as an end-to-end test scenario. +- Recommendation crystallized in **Approach 3 (prompt-level honoring + protocol documentation)** after architect feedback removed the issue-template scope. Low risk, low friction, smallest maintenance surface. +- The category hints (language / framework / deployment / dependencies / deferred decisions) live in the `protocol.md` documentation paragraph rather than in a template placeholder — same scaffolding, different surface. + +--- + +## Amendments + +### Amendment 1: Unconditional instruction paragraph (2026-05-17, iter-4) + +**Summary**: Drop the conditional-rendering requirement from the builder-prompt success criteria. The instruction paragraph is unconditional. + +**Problem addressed**: The original spec (iter-3) carried success criteria written against a parser-based design: *"When the section is absent or empty, the rendered prompt has no `## Baked Decisions` block (no empty stub)"*. When the architect's plan-approval feedback (2026-05-17 ~20:34 PDT) directed dropping the parser and replacing the `{{#if baked_decisions}}` block with *"a plain instruction paragraph (uniform across SPIR/ASPIR/AIR)"*, the plan was rewritten — but the spec text was not updated to match. Codex's PR-level CMAP review caught the resulting drift: the implementation (correctly per the architect's direction) puts the `## Baked Decisions` paragraph unconditionally in every builder-prompt render, but the spec text still required absence-of-block when the issue has no section. + +**Rationale for the architect-directed design**: An unconditional instruction paragraph teaches the convention to every builder, every time, regardless of whether the current issue uses it. When the issue has no Baked Decisions section, the paragraph is a no-op (the builder reads the instruction, looks at the issue body, finds no section, and proceeds normally). When the issue does have one, the paragraph tells the builder to honor it. This is more robust than conditional rendering — it doesn't depend on a parser detecting the section correctly, and it discoverably documents the convention in every builder session. + +**Spec changes**: +- **Success Criteria** — the "SPIR/ASPIR/AIR builder-prompt" criteria are reworded: instruction paragraph is unconditional and always present; the assertion is that the paragraph exists and that fixture content reaches the builder verbatim when present. +- **End-to-end smoke / No-regression criteria** — reworded to match: both with-fixture and without-fixture renders contain the instruction; the no-regression mechanism becomes "pure-addition diff against pre-change baselines" rather than "byte-identical when section absent". +- **Resolved Decisions #5** (empty section = no-op): still applies — but at the *builder-behavior* level, not at the *rendering* level. The builder sees the instruction; the absence of an issue-side Baked Decisions section means the instruction has nothing to act on. + +**Plan changes**: None. The iter-3 plan already reflects the architect-directed design; this amendment brings the spec text into alignment with the plan that was approved and implemented. + +**Implementation impact**: Zero. The committed implementation already matches the architect-directed unconditional design. The amendment is a documentation-side correction to remove the spec-vs-implementation drift Codex flagged. + + diff --git a/packages/codev/src/agent-farm/__tests__/baked-decisions.test.ts b/packages/codev/src/agent-farm/__tests__/baked-decisions.test.ts new file mode 100644 index 000000000..3f41e9a6b --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/baked-decisions.test.ts @@ -0,0 +1,808 @@ +/** + * Spec 746: Baked Architectural Decisions + * + * Verifies that the SPIR/ASPIR/AIR builder-prompts (and their codev-skeleton + * mirrors) include the "Baked Decisions" instruction paragraph after their + * `## Protocol` section, with carveout + contradiction-handling wording. + * + * Two test families: + * 1. Grep regression: each touched file contains the required literal strings. + * 2. Pure-addition diff: the post-change file is a strict line-superset of + * its captured baseline (zero removed lines, zero modified lines). + * + * Baselines for the 12 prompt files touched across Phases 1-3 are captured + * under __tests__/fixtures/baselines/ before any edits and asserted against + * here. Phases 2 and 3 extend this file with their own grep + diff tests. + */ + +import { describe, it, expect } from 'vitest'; +import * as fs from 'node:fs'; +import * as path from 'node:path'; +import { renderTemplate, type TemplateContext } from '../commands/spawn-roles.js'; + +// ============================================================================ +// Helpers +// ============================================================================ + +const repoRoot = path.resolve(__dirname, '../../../../..'); +const baselineDir = path.resolve(__dirname, 'fixtures/baselines'); + +function readRepoFile(relativePath: string): string { + return fs.readFileSync(path.resolve(repoRoot, relativePath), 'utf-8'); +} + +function readBaseline(baselineName: string): string { + return fs.readFileSync(path.resolve(baselineDir, baselineName), 'utf-8'); +} + +/** + * Assert that `current` is a pure-addition diff of `baseline` — every line of + * the baseline appears in `current` in the same relative order, with zero + * removed lines and zero modified lines. Additional lines in `current` are + * permitted (those are the additions). + * + * Algorithm: walk both files line by line, advancing the baseline pointer only + * when a match is found. If the current pointer reaches end-of-file before the + * baseline pointer does, a baseline line was removed or modified — fail. + */ +function expectPureAdditionDiff(label: string, baseline: string, current: string): void { + const baseLines = baseline.split('\n'); + const currLines = current.split('\n'); + let bi = 0; + let ci = 0; + while (bi < baseLines.length && ci < currLines.length) { + if (baseLines[bi] === currLines[ci]) { + bi++; + } + ci++; + } + if (bi < baseLines.length) { + const missing = baseLines.slice(bi, bi + 5).join('\n'); + throw new Error( + `${label}: pure-addition diff violated — baseline line ${bi + 1} ` + + `("${baseLines[bi]}") not found in current file after exhausting it. ` + + `Next ${Math.min(5, baseLines.length - bi)} missing line(s):\n${missing}`, + ); + } +} + +// ============================================================================ +// Phase 1: Builder-prompt instruction (SPIR/ASPIR/AIR + skeleton) +// ============================================================================ + +interface BuilderPromptFile { + label: string; + relPath: string; + baselineName: string | null; // null for skeleton mirrors (codev/ is the canonical baseline) +} + +const PHASE_1_FILES: BuilderPromptFile[] = [ + { + label: 'codev SPIR builder-prompt', + relPath: 'codev/protocols/spir/builder-prompt.md', + baselineName: 'spir-builder-prompt.md.baseline', + }, + { + label: 'codev ASPIR builder-prompt', + relPath: 'codev/protocols/aspir/builder-prompt.md', + baselineName: 'aspir-builder-prompt.md.baseline', + }, + { + label: 'codev AIR builder-prompt', + relPath: 'codev/protocols/air/builder-prompt.md', + baselineName: 'air-builder-prompt.md.baseline', + }, + { + label: 'skeleton SPIR builder-prompt', + relPath: 'codev-skeleton/protocols/spir/builder-prompt.md', + baselineName: null, + }, + { + label: 'skeleton ASPIR builder-prompt', + relPath: 'codev-skeleton/protocols/aspir/builder-prompt.md', + baselineName: null, + }, + { + label: 'skeleton AIR builder-prompt', + relPath: 'codev-skeleton/protocols/air/builder-prompt.md', + baselineName: null, + }, +]; + +describe('Spec 746 Phase 1: builder-prompt baked-decisions instruction', () => { + describe('grep regression: required strings present in each file', () => { + for (const file of PHASE_1_FILES) { + describe(file.label, () => { + const content = readRepoFile(file.relPath); + + it('contains the "Baked Decisions" heading', () => { + expect(content).toContain('## Baked Decisions'); + }); + + it('uses the carveout phrasing "do not autonomously"', () => { + expect(content.toLowerCase()).toContain('do not autonomously'); + }); + + it('addresses contradictions with "contradict" + "pause"', () => { + const lower = content.toLowerCase(); + expect(lower).toContain('contradict'); + expect(lower).toContain('pause'); + }); + + it('mentions the `afx send` escalation path', () => { + expect(content).toContain('afx send'); + }); + }); + } + }); + + describe('pure-addition diff: baseline lines are preserved in order', () => { + for (const file of PHASE_1_FILES) { + if (file.baselineName === null) continue; // skeleton mirrors don't have a baseline; codev/ is the source of truth + it(`${file.label}: post-edit file is a pure-addition diff of its baseline`, () => { + const baseline = readBaseline(file.baselineName!); + const current = readRepoFile(file.relPath); + expectPureAdditionDiff(file.label, baseline, current); + }); + } + }); + + it('codev SPIR builder-prompt baseline does NOT contain the new heading (pollution check)', () => { + // Catches the failure mode where the baseline was captured AFTER an edit. + const baseline = readBaseline('spir-builder-prompt.md.baseline'); + expect(baseline).not.toContain('## Baked Decisions'); + }); + + // Mirror-parity for the Baked Decisions paragraph specifically (Phase 1). + // + // The codev/ and codev-skeleton/ copies of each builder-prompt have + // pre-existing structural differences outside this work's scope (skeleton + // has Multi-PR Workflow / Verify Phase sections that codev/ doesn't, and + // a different PR-merged notification string). Those are PRE-EXISTING and + // not Phase 1's responsibility to reconcile. + // + // What IS Phase 1's responsibility: ensure the Baked Decisions paragraph + // itself is byte-identical across both copies, so future drift in this + // paragraph (e.g., someone edits codev/ but forgets skeleton) is caught. + describe('baked-decisions paragraph is byte-identical across codev/ and skeleton', () => { + const PROTOCOLS = ['spir', 'aspir', 'air'] as const; + const BAKED_HEADER = '## Baked Decisions'; + + // Extract the Baked Decisions paragraph from a file's full content. + // Returns the heading + body up to (but not including) the next heading + // or the end of file. Throws if the heading is not found. + function extractBakedSection(label: string, fullContent: string): string { + const headerIdx = fullContent.indexOf(BAKED_HEADER); + if (headerIdx === -1) { + throw new Error(`${label}: "${BAKED_HEADER}" heading not found`); + } + const rest = fullContent.slice(headerIdx); + // Find the next markdown heading line (starts with #, on its own line). + const lines = rest.split('\n'); + const endLine = lines.findIndex( + (line, i) => i > 0 && /^#{1,6}\s/.test(line), + ); + const sectionLines = endLine === -1 ? lines : lines.slice(0, endLine); + // Trim trailing blank lines so a stray newline doesn't cause false mismatches. + while (sectionLines.length > 0 && sectionLines[sectionLines.length - 1].trim() === '') { + sectionLines.pop(); + } + return sectionLines.join('\n'); + } + + for (const protocol of PROTOCOLS) { + it(`${protocol}: codev/ and skeleton Baked Decisions paragraphs match`, () => { + const codevContent = readRepoFile(`codev/protocols/${protocol}/builder-prompt.md`); + const skeletonContent = readRepoFile(`codev-skeleton/protocols/${protocol}/builder-prompt.md`); + const codevSection = extractBakedSection(`codev ${protocol}`, codevContent); + const skeletonSection = extractBakedSection(`skeleton ${protocol}`, skeletonContent); + expect(skeletonSection).toEqual(codevSection); + }); + } + }); +}); + +// ============================================================================ +// Phase 2: Drafting prompts (SPIR/ASPIR specify.md + AIR implement.md + skeleton) +// ============================================================================ + +interface DraftingPromptFile { + label: string; + relPath: string; + baselineName: string | null; // null for skeleton mirrors +} + +const PHASE_2_FILES: DraftingPromptFile[] = [ + { + label: 'codev SPIR specify.md', + relPath: 'codev/protocols/spir/prompts/specify.md', + baselineName: 'spir-specify.md.baseline', + }, + { + label: 'codev ASPIR specify.md', + relPath: 'codev/protocols/aspir/prompts/specify.md', + baselineName: 'aspir-specify.md.baseline', + }, + { + label: 'codev AIR implement.md', + relPath: 'codev/protocols/air/prompts/implement.md', + baselineName: 'air-implement.md.baseline', + }, + { + label: 'skeleton SPIR specify.md', + relPath: 'codev-skeleton/protocols/spir/prompts/specify.md', + baselineName: null, + }, + { + label: 'skeleton ASPIR specify.md', + relPath: 'codev-skeleton/protocols/aspir/prompts/specify.md', + baselineName: null, + }, + { + label: 'skeleton AIR implement.md', + relPath: 'codev-skeleton/protocols/air/prompts/implement.md', + baselineName: null, + }, +]; + +describe('Spec 746 Phase 2: drafting-prompt baked-decisions clause', () => { + describe('grep regression: required strings present in each file', () => { + for (const file of PHASE_2_FILES) { + describe(file.label, () => { + const content = readRepoFile(file.relPath); + + it('contains the literal "Baked Decisions"', () => { + expect(content).toContain('Baked Decisions'); + }); + + it('uses the carveout phrasing "do not autonomously"', () => { + expect(content.toLowerCase()).toContain('do not autonomously'); + }); + + it('addresses contradictions with "contradict" + "pause" + "flag"', () => { + const lower = content.toLowerCase(); + expect(lower).toContain('contradict'); + expect(lower).toContain('pause'); + expect(lower).toContain('flag'); + }); + + it('mentions the `afx send` escalation path', () => { + expect(content).toContain('afx send'); + }); + }); + } + }); + + describe('pure-addition diff: baseline lines preserved in order', () => { + for (const file of PHASE_2_FILES) { + if (file.baselineName === null) continue; + it(`${file.label}: post-edit file is a pure-addition diff of its baseline`, () => { + const baseline = readBaseline(file.baselineName!); + const current = readRepoFile(file.relPath); + expectPureAdditionDiff(file.label, baseline, current); + }); + } + }); + + describe('baked-decisions clause is byte-identical across codev/ and skeleton', () => { + interface MirrorPair { + protocol: string; + codev: string; + skeleton: string; + } + const PAIRS: MirrorPair[] = [ + { + protocol: 'spir specify.md', + codev: 'codev/protocols/spir/prompts/specify.md', + skeleton: 'codev-skeleton/protocols/spir/prompts/specify.md', + }, + { + protocol: 'aspir specify.md', + codev: 'codev/protocols/aspir/prompts/specify.md', + skeleton: 'codev-skeleton/protocols/aspir/prompts/specify.md', + }, + { + protocol: 'air implement.md', + codev: 'codev/protocols/air/prompts/implement.md', + skeleton: 'codev-skeleton/protocols/air/prompts/implement.md', + }, + ]; + + // Extract the paragraph containing "Baked Decisions" — from the first line + // matching it up to the next markdown heading. Works whether the heading + // is `## Baked Decisions` (AIR), `### 0.5 Baked Decisions` (SPIR/ASPIR), + // or any other variant the architect might write. + function extractBakedClause(label: string, fullContent: string): string { + const lines = fullContent.split('\n'); + const startIdx = lines.findIndex(line => /Baked Decisions/i.test(line)); + if (startIdx === -1) { + throw new Error(`${label}: no line containing "Baked Decisions" found`); + } + // Find the next markdown heading after the start line. + const endIdx = lines.findIndex( + (line, i) => i > startIdx && /^#{1,6}\s/.test(line), + ); + const sectionLines = endIdx === -1 ? lines.slice(startIdx) : lines.slice(startIdx, endIdx); + while (sectionLines.length > 0 && sectionLines[sectionLines.length - 1].trim() === '') { + sectionLines.pop(); + } + return sectionLines.join('\n'); + } + + for (const pair of PAIRS) { + it(`${pair.protocol}: codev/ and skeleton clauses match`, () => { + const codevContent = readRepoFile(pair.codev); + const skeletonContent = readRepoFile(pair.skeleton); + const codevClause = extractBakedClause(`codev ${pair.protocol}`, codevContent); + const skeletonClause = extractBakedClause(`skeleton ${pair.protocol}`, skeletonContent); + expect(skeletonClause).toEqual(codevClause); + }); + } + }); + + it('codev SPIR specify.md baseline does NOT contain "Baked Decisions" (pollution check)', () => { + const baseline = readBaseline('spir-specify.md.baseline'); + expect(baseline).not.toContain('Baked Decisions'); + }); +}); + +// ============================================================================ +// Phase 3: Reviewer prompts (spec-review / plan-review / impl-review / pr-review + skeleton) +// ============================================================================ + +interface ReviewerPromptFile { + label: string; + relPath: string; + baselineName: string | null; +} + +const PHASE_3_FILES: ReviewerPromptFile[] = [ + { + label: 'codev SPIR spec-review', + relPath: 'codev/protocols/spir/consult-types/spec-review.md', + baselineName: 'spir-spec-review.md.baseline', + }, + { + label: 'codev ASPIR spec-review', + relPath: 'codev/protocols/aspir/consult-types/spec-review.md', + baselineName: 'aspir-spec-review.md.baseline', + }, + { + label: 'codev SPIR plan-review', + relPath: 'codev/protocols/spir/consult-types/plan-review.md', + baselineName: 'spir-plan-review.md.baseline', + }, + { + label: 'codev ASPIR plan-review', + relPath: 'codev/protocols/aspir/consult-types/plan-review.md', + baselineName: 'aspir-plan-review.md.baseline', + }, + { + label: 'codev AIR impl-review', + relPath: 'codev/protocols/air/consult-types/impl-review.md', + baselineName: 'air-impl-review.md.baseline', + }, + { + label: 'codev AIR pr-review', + relPath: 'codev/protocols/air/consult-types/pr-review.md', + baselineName: 'air-pr-review.md.baseline', + }, + { + label: 'skeleton SPIR spec-review', + relPath: 'codev-skeleton/protocols/spir/consult-types/spec-review.md', + baselineName: null, + }, + { + label: 'skeleton ASPIR spec-review', + relPath: 'codev-skeleton/protocols/aspir/consult-types/spec-review.md', + baselineName: null, + }, + { + label: 'skeleton SPIR plan-review', + relPath: 'codev-skeleton/protocols/spir/consult-types/plan-review.md', + baselineName: null, + }, + { + label: 'skeleton ASPIR plan-review', + relPath: 'codev-skeleton/protocols/aspir/consult-types/plan-review.md', + baselineName: null, + }, + { + label: 'skeleton AIR impl-review', + relPath: 'codev-skeleton/protocols/air/consult-types/impl-review.md', + baselineName: null, + }, + { + label: 'skeleton AIR pr-review', + relPath: 'codev-skeleton/protocols/air/consult-types/pr-review.md', + baselineName: null, + }, +]; + +describe('Spec 746 Phase 3: reviewer-prompt baked-decisions clause', () => { + // Extract the `## Baked Decisions` section from a reviewer-prompt file so + // that the grep assertions below scope to the new paragraph specifically. + // + // This matters because the pre-existing `## Verdict Format` section in all + // 6 reviewer prompts already contains the literal strings `COMMENT` and + // `REQUEST_CHANGES`. A file-level `toContain` check would pass even if the + // new Baked Decisions paragraph lost those tokens, defeating the regression. + // + // Fix per Codex Phase 3 iter-1 feedback: extract the section first, assert + // against the section only. + function extractBakedSection(label: string, fullContent: string): string { + const headerIdx = fullContent.indexOf('## Baked Decisions'); + if (headerIdx === -1) { + throw new Error(`${label}: "## Baked Decisions" heading not found`); + } + const rest = fullContent.slice(headerIdx); + const lines = rest.split('\n'); + const endLine = lines.findIndex( + (line, i) => i > 0 && /^#{1,6}\s/.test(line), + ); + const sectionLines = endLine === -1 ? lines : lines.slice(0, endLine); + while (sectionLines.length > 0 && sectionLines[sectionLines.length - 1].trim() === '') { + sectionLines.pop(); + } + return sectionLines.join('\n'); + } + + describe('grep regression: required content present in the extracted Baked Decisions section', () => { + for (const file of PHASE_3_FILES) { + describe(file.label, () => { + const content = readRepoFile(file.relPath); + const section = extractBakedSection(file.label, content); + + it('contains the literal "Baked Decisions" heading (file-level)', () => { + expect(content).toContain('## Baked Decisions'); + }); + + it('section uses the carveout phrasing "do not autonomously"', () => { + expect(section.toLowerCase()).toContain('do not autonomously'); + }); + + it('section distinguishes COMMENT from REQUEST_CHANGES (not just the file)', () => { + // Both tokens must appear *inside* the Baked Decisions paragraph — + // not just in the pre-existing Verdict Format section elsewhere. + expect(section).toContain('COMMENT'); + expect(section).toContain('REQUEST_CHANGES'); + }); + + it('section addresses contradictions with "contradict" + "clarify"', () => { + const lower = section.toLowerCase(); + expect(lower).toContain('contradict'); + expect(lower).toContain('clarify'); + }); + }); + } + }); + + describe('pure-addition diff: baseline lines preserved in order', () => { + for (const file of PHASE_3_FILES) { + if (file.baselineName === null) continue; + it(`${file.label}: post-edit file is a pure-addition diff of its baseline`, () => { + const baseline = readBaseline(file.baselineName!); + const current = readRepoFile(file.relPath); + expectPureAdditionDiff(file.label, baseline, current); + }); + } + }); + + describe('baked-decisions clause is byte-identical across codev/ and skeleton', () => { + interface MirrorPair { + protocol: string; + codev: string; + skeleton: string; + } + const PAIRS: MirrorPair[] = [ + { + protocol: 'spir spec-review', + codev: 'codev/protocols/spir/consult-types/spec-review.md', + skeleton: 'codev-skeleton/protocols/spir/consult-types/spec-review.md', + }, + { + protocol: 'aspir spec-review', + codev: 'codev/protocols/aspir/consult-types/spec-review.md', + skeleton: 'codev-skeleton/protocols/aspir/consult-types/spec-review.md', + }, + { + protocol: 'spir plan-review', + codev: 'codev/protocols/spir/consult-types/plan-review.md', + skeleton: 'codev-skeleton/protocols/spir/consult-types/plan-review.md', + }, + { + protocol: 'aspir plan-review', + codev: 'codev/protocols/aspir/consult-types/plan-review.md', + skeleton: 'codev-skeleton/protocols/aspir/consult-types/plan-review.md', + }, + { + protocol: 'air impl-review', + codev: 'codev/protocols/air/consult-types/impl-review.md', + skeleton: 'codev-skeleton/protocols/air/consult-types/impl-review.md', + }, + { + protocol: 'air pr-review', + codev: 'codev/protocols/air/consult-types/pr-review.md', + skeleton: 'codev-skeleton/protocols/air/consult-types/pr-review.md', + }, + ]; + + for (const pair of PAIRS) { + it(`${pair.protocol}: codev/ and skeleton sections match`, () => { + const codevContent = readRepoFile(pair.codev); + const skeletonContent = readRepoFile(pair.skeleton); + // Reuse the same extractBakedSection helper defined at the top of this describe. + const codevSection = extractBakedSection(`codev ${pair.protocol}`, codevContent); + const skeletonSection = extractBakedSection(`skeleton ${pair.protocol}`, skeletonContent); + expect(skeletonSection).toEqual(codevSection); + }); + } + }); + + it('codev SPIR spec-review baseline does NOT contain "Baked Decisions" (pollution check)', () => { + const baseline = readBaseline('spir-spec-review.md.baseline'); + expect(baseline).not.toContain('Baked Decisions'); + }); +}); + +// ============================================================================ +// Phase 4: Protocol documentation paragraphs + final regression sweep +// ============================================================================ + +interface ProtocolDocFile { + label: string; + relPath: string; +} + +const PHASE_4_FILES: ProtocolDocFile[] = [ + { label: 'codev SPIR protocol.md', relPath: 'codev/protocols/spir/protocol.md' }, + { label: 'codev ASPIR protocol.md', relPath: 'codev/protocols/aspir/protocol.md' }, + { label: 'codev AIR protocol.md', relPath: 'codev/protocols/air/protocol.md' }, + { label: 'skeleton SPIR protocol.md', relPath: 'codev-skeleton/protocols/spir/protocol.md' }, + { label: 'skeleton ASPIR protocol.md', relPath: 'codev-skeleton/protocols/aspir/protocol.md' }, + { label: 'skeleton AIR protocol.md', relPath: 'codev-skeleton/protocols/air/protocol.md' }, +]; + +describe('Spec 746 Phase 4: protocol documentation discoverability paragraph', () => { + describe('grep regression: required content present in each protocol.md', () => { + for (const file of PHASE_4_FILES) { + describe(file.label, () => { + const content = readRepoFile(file.relPath); + + it('contains the "Baked Decisions" keyword', () => { + expect(content).toContain('Baked Decisions'); + }); + + it('mentions category hints (language + framework + dependencies)', () => { + const lower = content.toLowerCase(); + expect(lower).toContain('language'); + expect(lower).toContain('framework'); + expect(lower).toContain('dependencies'); + }); + + it('documents the amend/rescind escape hatch', () => { + const lower = content.toLowerCase(); + // Either "amend" or "rescind" + a way to do it ("respawn" or "afx send") + expect(lower).toMatch(/amend|rescind/); + expect(lower).toMatch(/respawn|afx send/); + }); + + it('describes the absence default explicitly', () => { + expect(content.toLowerCase()).toContain('no-op default'); + }); + }); + } + }); + + describe('discoverability paragraph is byte-identical across codev/ and skeleton', () => { + interface DocPair { + protocol: string; + codev: string; + skeleton: string; + } + const PAIRS: DocPair[] = [ + { + protocol: 'spir protocol.md', + codev: 'codev/protocols/spir/protocol.md', + skeleton: 'codev-skeleton/protocols/spir/protocol.md', + }, + { + protocol: 'aspir protocol.md', + codev: 'codev/protocols/aspir/protocol.md', + skeleton: 'codev-skeleton/protocols/aspir/protocol.md', + }, + { + protocol: 'air protocol.md', + codev: 'codev/protocols/air/protocol.md', + skeleton: 'codev-skeleton/protocols/air/protocol.md', + }, + ]; + + function extractBakedDocsSection(label: string, fullContent: string): string { + const headerIdx = fullContent.indexOf('## Baked Decisions'); + if (headerIdx === -1) { + throw new Error(`${label}: "## Baked Decisions" heading not found`); + } + const rest = fullContent.slice(headerIdx); + const lines = rest.split('\n'); + const endLine = lines.findIndex( + (line, i) => i > 0 && /^#{1,6}\s/.test(line), + ); + const sectionLines = endLine === -1 ? lines : lines.slice(0, endLine); + while (sectionLines.length > 0 && sectionLines[sectionLines.length - 1].trim() === '') { + sectionLines.pop(); + } + return sectionLines.join('\n'); + } + + for (const pair of PAIRS) { + it(`${pair.protocol}: codev/ and skeleton sections match`, () => { + const codevContent = readRepoFile(pair.codev); + const skeletonContent = readRepoFile(pair.skeleton); + const codevSection = extractBakedDocsSection(`codev ${pair.protocol}`, codevContent); + const skeletonSection = extractBakedDocsSection(`skeleton ${pair.protocol}`, skeletonContent); + expect(skeletonSection).toEqual(codevSection); + }); + } + }); +}); + +// ============================================================================ +// Phase 4 final sweep: every touched file has the required content, +// and codev/ ↔ skeleton parity holds for the Baked Decisions sections of all 21 files. +// ============================================================================ + +describe('Spec 746 Phase 4 final sweep: end-to-end regression check', () => { + // All 30 files touched across Phases 1-4: + // - Phase 1: 3 codev + 3 skeleton builder-prompts (6) + // - Phase 2: 3 codev + 3 skeleton drafting prompts (6) + // - Phase 3: 6 codev + 6 skeleton reviewer prompts (12) + // - Phase 4: 3 codev + 3 skeleton protocol.md (6) + const ALL_TOUCHED_FILES = [ + // Phase 1 + 'codev/protocols/spir/builder-prompt.md', + 'codev/protocols/aspir/builder-prompt.md', + 'codev/protocols/air/builder-prompt.md', + 'codev-skeleton/protocols/spir/builder-prompt.md', + 'codev-skeleton/protocols/aspir/builder-prompt.md', + 'codev-skeleton/protocols/air/builder-prompt.md', + // Phase 2 + 'codev/protocols/spir/prompts/specify.md', + 'codev/protocols/aspir/prompts/specify.md', + 'codev/protocols/air/prompts/implement.md', + 'codev-skeleton/protocols/spir/prompts/specify.md', + 'codev-skeleton/protocols/aspir/prompts/specify.md', + 'codev-skeleton/protocols/air/prompts/implement.md', + // Phase 3 + 'codev/protocols/spir/consult-types/spec-review.md', + 'codev/protocols/aspir/consult-types/spec-review.md', + 'codev/protocols/spir/consult-types/plan-review.md', + 'codev/protocols/aspir/consult-types/plan-review.md', + 'codev/protocols/air/consult-types/impl-review.md', + 'codev/protocols/air/consult-types/pr-review.md', + 'codev-skeleton/protocols/spir/consult-types/spec-review.md', + 'codev-skeleton/protocols/aspir/consult-types/spec-review.md', + 'codev-skeleton/protocols/spir/consult-types/plan-review.md', + 'codev-skeleton/protocols/aspir/consult-types/plan-review.md', + 'codev-skeleton/protocols/air/consult-types/impl-review.md', + 'codev-skeleton/protocols/air/consult-types/pr-review.md', + // Phase 4 + 'codev/protocols/spir/protocol.md', + 'codev/protocols/aspir/protocol.md', + 'codev/protocols/air/protocol.md', + 'codev-skeleton/protocols/spir/protocol.md', + 'codev-skeleton/protocols/aspir/protocol.md', + 'codev-skeleton/protocols/air/protocol.md', + ]; + + it('30 files were touched across Phases 1-4 (sanity check on the inventory)', () => { + expect(ALL_TOUCHED_FILES.length).toBe(30); + }); + + describe('cross-phase: every touched file contains "Baked Decisions"', () => { + for (const relPath of ALL_TOUCHED_FILES) { + it(relPath, () => { + const content = readRepoFile(relPath); + expect(content).toContain('Baked Decisions'); + }); + } + }); +}); + +// ============================================================================ +// Phase 4 end-to-end smoke: render each builder-prompt against a fixture +// issue whose body contains a `## Baked Decisions` section. Verify the +// rendered prompt contains BOTH (1) the Phase 1 instruction paragraph and +// (2) the issue's baked-decisions content verbatim (via {{issue.body}}). +// +// This converts the plan's "manual smoke" deliverable into an automated +// regression test — strictly better than a one-time check at PR time. +// Codex Phase 4 iter-1 flagged the missing smoke evidence; this closes it. +// ============================================================================ + +describe('Spec 746 end-to-end smoke: builder-prompt rendering with baked-decisions issue', () => { + const FIXTURE_ISSUE_BODY = [ + '## Background', + '', + 'We want a persona harness.', + '', + '## Baked Decisions', + '', + '- Language: Python (match shanutil)', + '- Framework: minimal stdlib', + '', + '## Done When', + '', + 'It works.', + ].join('\n'); + + function makeContext(protocolName: string): TemplateContext { + return { + protocol_name: protocolName.toUpperCase(), + mode: 'strict', + mode_soft: false, + mode_strict: true, + project_id: '999', + input_description: 'a test feature', + issue: { + number: 999, + title: 'Test issue with baked decisions', + body: FIXTURE_ISSUE_BODY, + }, + }; + } + + for (const protocol of ['spir', 'aspir', 'air']) { + describe(`${protocol} builder-prompt`, () => { + const templatePath = path.resolve(repoRoot, `codev/protocols/${protocol}/builder-prompt.md`); + const template = fs.readFileSync(templatePath, 'utf-8'); + const ctx = makeContext(protocol); + const rendered = renderTemplate(template, ctx); + + it('rendered prompt contains the Phase 1 instruction paragraph', () => { + expect(rendered).toContain('## Baked Decisions'); + expect(rendered.toLowerCase()).toContain('do not autonomously override'); + }); + + it('rendered prompt contains the issue body verbatim, including its baked-decisions section', () => { + expect(rendered).toContain('## Background'); + // The issue's own "## Baked Decisions" heading + content reaches the builder. + expect(rendered).toContain('Language: Python (match shanutil)'); + expect(rendered).toContain('Framework: minimal stdlib'); + }); + + it('rendered prompt does NOT contain "{{" handlebars residue (template fully rendered)', () => { + expect(rendered).not.toContain('{{'); + expect(rendered).not.toContain('}}'); + }); + }); + } + + describe('absence default: rendering with an issue that has no baked-decisions section', () => { + const PLAIN_ISSUE_BODY = '## Background\n\nA boring feature.\n\n## Done When\n\nIt ships.'; + + for (const protocol of ['spir', 'aspir', 'air']) { + it(`${protocol} builder-prompt: instruction paragraph is still present (it's unconditional)`, () => { + const template = fs.readFileSync( + path.resolve(repoRoot, `codev/protocols/${protocol}/builder-prompt.md`), + 'utf-8', + ); + const ctx: TemplateContext = { + protocol_name: protocol.toUpperCase(), + mode: 'strict', + mode_soft: false, + mode_strict: true, + project_id: '999', + input_description: 'a test feature', + issue: { number: 999, title: 'Plain issue', body: PLAIN_ISSUE_BODY }, + }; + const rendered = renderTemplate(template, ctx); + // The instruction paragraph fires on EVERY render. Builders see it even + // when no baked-decisions section is present — it's a no-op for them. + // This is intentional: the prompt teaches the convention. + expect(rendered).toContain('## Baked Decisions'); + expect(rendered).not.toContain('Language: Python'); // no fixture content present + }); + } + }); +}); diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-builder-prompt.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-builder-prompt.md.baseline new file mode 100644 index 000000000..8a44aede1 --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-builder-prompt.md.baseline @@ -0,0 +1,69 @@ +# {{protocol_name}} Builder ({{mode}} mode) + +You are implementing {{input_description}}. + +{{#if mode_soft}} +## Mode: SOFT +You are running in SOFT mode. This means: +- You follow the AIR protocol yourself (no porch orchestration) +- The architect monitors your work and verifies you're adhering to the protocol +- Consultation is optional — use your judgement based on complexity +- You have flexibility in execution, but must stay compliant with the protocol +{{/if}} + +{{#if mode_strict}} +## Mode: STRICT +You are running in STRICT mode. This means: +- Porch orchestrates your work +- Run: `porch next` to get your next tasks +- Follow porch signals and gate approvals + +### ABSOLUTE RESTRICTIONS (STRICT MODE) +- **NEVER edit `status.yaml` directly** — only porch commands may modify project state +- **NEVER call `porch approve` without explicit human approval** — only run it after the architect says to +{{/if}} + +## Protocol +Follow the AIR protocol: `codev/protocols/air/protocol.md` + +{{#if issue}} +## Issue #{{issue.number}} +**Title**: {{issue.title}} + +**Description**: +{{issue.body}} + +## Your Mission +1. Read the issue requirements carefully +2. Implement the feature (< 300 LOC) +3. Write tests for the feature +4. Create PR with review in the PR body (NOT as a separate file) +5. Notify architect via `afx send architect "PR #N ready for review (implements #{{issue.number}})"` + +**IMPORTANT**: AIR produces NO spec, plan, or review files. The review goes in the PR body. + +If the feature is too complex (> 300 LOC or architectural changes), notify the Architect via: +```bash +afx send architect "Issue #{{issue.number}} is more complex than expected. [Reason]. Recommend escalating to ASPIR." +``` + +## Notifications +Always use `afx send architect "..."` to notify the architect at key moments: +- **PR ready**: `afx send architect "PR #N ready for review (implements #{{issue.number}})"` +- **PR merged**: `afx send architect "PR #N merged for issue #{{issue.number}}. Ready for cleanup."` +- **Blocked**: `afx send architect "Blocked on issue #{{issue.number}}: [reason]"` +{{/if}} + +## Handling Flaky Tests + +If you encounter **pre-existing flaky tests** (intermittent failures unrelated to your changes): +1. **DO NOT** edit `status.yaml` to bypass checks +2. **DO NOT** skip porch checks or use any workaround to avoid the failure +3. **DO** mark the test as skipped with a clear annotation (e.g., `it.skip('...') // FLAKY: skipped pending investigation`) +4. **DO** document each skipped flaky test in the PR body under a "Flaky Tests" section +5. Commit the skip and continue with your work + +## Getting Started +1. Read the AIR protocol +2. Review the issue details +3. Implement the feature diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-impl-review.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-impl-review.md.baseline new file mode 100644 index 000000000..aacdefbdb --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-impl-review.md.baseline @@ -0,0 +1,58 @@ +# Implementation Review Prompt + +## Context +You are reviewing implementation work for a small feature built under the AIR protocol. The builder implemented directly from a GitHub issue — there is no spec or plan document. Your job is to verify the implementation matches the issue requirements and follows good practices. + +## CRITICAL: Verify Before Flagging + +Before requesting changes for missing configuration, incorrect patterns, or framework issues: +1. **Check `package.json`** for actual dependency versions — framework conventions change between major versions +2. **Read the actual config files** (or confirm their deliberate absence) before flagging missing configs +3. **Do not assume** your training data reflects the version in use — verify against project files + +## Focus Areas + +1. **Issue Adherence** + - Does the implementation fulfill the issue requirements? + - Are the described acceptance criteria met? + +2. **Code Quality** + - Is the code readable and maintainable? + - Are there obvious bugs or issues? + - Are error cases handled appropriately? + +3. **Test Coverage** + - Are the tests adequate? + - Do tests cover the main paths AND edge cases? + +4. **Scope** + - Is the change under 300 LOC? If not, should this be escalated to ASPIR? + - Does the implementation stay focused on the issue, or does it include unrelated changes? + +## Verdict Format + +After your review, provide your verdict in exactly this format: + +``` +--- +VERDICT: [APPROVE | REQUEST_CHANGES | COMMENT] +SUMMARY: [One-line summary of your assessment] +CONFIDENCE: [HIGH | MEDIUM | LOW] +--- +KEY_ISSUES: +- [Issue 1 or "None"] +- [Issue 2] +... +``` + +**Verdict meanings:** +- `APPROVE`: Implementation looks good, ready for PR +- `REQUEST_CHANGES`: Issues that must be fixed +- `COMMENT`: Minor suggestions, can proceed but note feedback + +## Notes + +- AIR has no spec or plan — review against the GitHub issue +- Focus on "does this feature work correctly" not "is this architecturally perfect" +- If referencing line numbers, use `file:line` format +- The builder needs actionable feedback diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-implement.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-implement.md.baseline new file mode 100644 index 000000000..52767f21a --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-implement.md.baseline @@ -0,0 +1,89 @@ +# IMPLEMENT Phase Prompt + +You are executing the **IMPLEMENT** phase of the AIR protocol. + +## Your Goal + +Read the GitHub issue, implement the feature, and add tests. Keep it focused and under 300 LOC. + +## Context + +- **Issue**: #{{issue.number}} — {{issue.title}} +- **Current State**: {{current_state}} + +## Process + +### 1. Read the Issue + +Read the full issue description. Identify: +- What is the desired behavior? +- What are the acceptance criteria? +- Are there examples or mockups? +- What files/modules are likely affected? + +### 2. Implement the Feature + +Apply a focused implementation: +- Implement what the issue describes — no more, no less +- Do NOT refactor surrounding code +- Do NOT add features beyond what's described in the issue +- Do NOT fix unrelated bugs you happen to notice (file separate issues) + +**Code Quality**: +- Self-documenting code (clear names, obvious structure) +- No commented-out code or debug prints +- Follow existing project conventions + +### 3. Add Tests + +Write tests that: +- Cover the main happy path +- Cover key edge cases +- Are deterministic (not flaky) + +Place tests following project conventions (`__tests__/`, `*.test.ts`, etc.). + +### 4. Verify the Build + +Run build and tests: + +```bash +npm run build # Must pass +npm test # Must pass +``` + +Fix any failures before proceeding. If build/test commands don't exist, check `package.json`. + +### 5. Commit + +Stage and commit your changes: +- Use explicit file paths (never `git add -A` or `git add .`) +- Commit message: `[Air #{{issue.number}}] feat: ` + +## Signals + +When implementation and tests are complete and passing: + +``` +PHASE_COMPLETE +``` + +If the feature is too complex for AIR (> 300 LOC or architectural): + +``` +TOO_COMPLEX +``` + +If you're blocked (missing context, unclear requirements, etc.): + +``` +BLOCKED:reason goes here +``` + +## Important Notes + +1. **Stay focused** — Implement what the issue describes, nothing else +2. **Tests are expected** — Add tests unless the change is purely declarative (e.g., config only) +3. **Build AND tests must pass** — Don't signal complete until both pass +4. **Stay under 300 LOC** — If the feature grows beyond this, signal `TOO_COMPLEX` +5. **No spec/plan artifacts** — AIR does not create files in `codev/specs/` or `codev/plans/` diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-pr-review.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-pr-review.md.baseline new file mode 100644 index 000000000..915903934 --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/air-pr-review.md.baseline @@ -0,0 +1,58 @@ +# PR Ready Review Prompt + +## Context +You are performing a review of a pull request created under the AIR protocol. The builder implemented a small feature directly from a GitHub issue — there are no spec, plan, or review files. The review is embedded in the PR body. + +## Focus Areas + +1. **Completeness** + - Are the issue requirements implemented? + - Is the PR body review section filled out (summary, key decisions, test plan)? + - Are commits properly formatted? + +2. **Test Status** + - Do all tests pass? + - Is test coverage adequate for the changes? + - Are there any skipped or flaky tests? + +3. **Code Cleanliness** + - Is there any debug code left in? + - Are there any TODO comments that should be resolved? + - Is the code properly formatted? + +4. **Scope** + - Is the change under 300 LOC? + - Does the implementation stay focused on the issue? + - Are there unrelated changes bundled in? + +5. **PR Quality** + - Does the PR link to the issue? + - Is the PR body review section informative? + - Is the branch up to date with main? + +## Verdict Format + +After your review, provide your verdict in exactly this format: + +``` +--- +VERDICT: [APPROVE | REQUEST_CHANGES | COMMENT] +SUMMARY: [One-line summary of your assessment] +CONFIDENCE: [HIGH | MEDIUM | LOW] +--- +KEY_ISSUES: +- [Issue 1 or "None"] +- [Issue 2] +... +``` + +**Verdict meanings:** +- `APPROVE`: Ready for architect review +- `REQUEST_CHANGES`: Issues to fix before review +- `COMMENT`: Minor items, can proceed but note feedback + +## Notes + +- AIR has no spec, plan, or review files — review the PR body and code diff +- Focus on "is this ready for someone else to review" not "is this perfect" +- Any issues found here are cheaper to fix than during integration review diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-builder-prompt.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-builder-prompt.md.baseline new file mode 100644 index 000000000..2715ed877 --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-builder-prompt.md.baseline @@ -0,0 +1,75 @@ +# {{protocol_name}} Builder ({{mode}} mode) + +You are implementing {{input_description}}. + +{{#if mode_soft}} +## Mode: SOFT +You are running in SOFT mode. This means: +- You follow the protocol document yourself (no porch orchestration) +- The architect monitors your work and verifies you're adhering to the protocol +- Run consultations manually when the protocol calls for them +- You have flexibility in execution, but must stay compliant with the protocol +{{/if}} + +{{#if mode_strict}} +## Mode: STRICT +You are running in STRICT mode. This means: +- Porch orchestrates your work +- Run: `porch next` to get your next tasks +- Follow porch signals and gate approvals +- Do not deviate from the porch-driven workflow + +### ABSOLUTE RESTRICTIONS (STRICT MODE) +- **NEVER edit `status.yaml` directly** — only porch commands may modify project state +- **NEVER call `porch approve` without explicit human approval** — only run it after the architect says to +- **NEVER skip the 3-way review** — always follow porch next → porch done cycle +- **NEVER advance plan phases manually** — porch handles phase transitions after unanimous review approval +{{/if}} + +## Protocol +Follow the ASPIR protocol: `codev/protocols/aspir/protocol.md` +Read and internalize the protocol before starting any work. + +{{#if spec}} +## Spec +Read the specification at: `{{spec.path}}` +{{/if}} + +{{#if plan}} +## Plan +Follow the implementation plan at: `{{plan.path}}` +{{/if}} + +{{#if issue}} +## Issue #{{issue.number}} +**Title**: {{issue.title}} + +**Description**: +{{issue.body}} +{{/if}} + +{{#if task}} +## Task +{{task_text}} +{{/if}} + +## Notifications +Always use `afx send architect "..."` to notify the architect at key moments: +- **Gate reached**: `afx send architect "Project {{project_id}}: ready for approval"` +- **PR ready**: `afx send architect "PR #N ready for review (project {{project_id}})"` +- **PR merged**: `afx send architect "Project {{project_id}} complete. PR merged. Ready for cleanup."` +- **Blocked**: `afx send architect "Blocked on project {{project_id}}: [reason]"` + +## Handling Flaky Tests + +If you encounter **pre-existing flaky tests** (intermittent failures unrelated to your changes): +1. **DO NOT** edit `status.yaml` to bypass checks +2. **DO NOT** skip porch checks or use any workaround to avoid the failure +3. **DO** mark the test as skipped with a clear annotation (e.g., `it.skip('...') // FLAKY: skipped pending investigation`) +4. **DO** document each skipped flaky test in your review under a `## Flaky Tests` section +5. Commit the skip and continue with your work + +## Getting Started +1. Read the protocol document thoroughly +2. Review the spec and plan (if available) +3. Begin implementation following the protocol phases diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-plan-review.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-plan-review.md.baseline new file mode 100644 index 000000000..585085dec --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-plan-review.md.baseline @@ -0,0 +1,59 @@ +# Plan Review Prompt + +## Context +You are reviewing an implementation plan during the Plan phase. The spec has been approved - now you must evaluate whether the plan adequately describes HOW to implement it. + +## Focus Areas + +1. **Spec Coverage** + - Does the plan address all requirements in the spec? + - Are there spec requirements not covered by any phase? + - Are there phases that go beyond the spec scope? + +2. **Phase Breakdown** + - Are phases appropriately sized (not too large or too small)? + - Is the sequence logical (dependencies respected)? + - Can each phase be completed and committed independently? + +3. **Technical Approach** + - Is the implementation approach sound? + - Are the right files/modules being modified? + - Are there obvious better approaches being missed? + +4. **Testability** + - Does each phase have clear test criteria? + - Will the Defend step (writing tests) be feasible? + - Are edge cases from the spec addressable? + +5. **Risk Assessment** + - Are there potential blockers not addressed? + - Are dependencies on other systems identified? + - Is the plan realistic given constraints? + +## Verdict Format + +After your review, provide your verdict in exactly this format: + +``` +--- +VERDICT: [APPROVE | REQUEST_CHANGES | COMMENT] +SUMMARY: [One-line summary of your assessment] +CONFIDENCE: [HIGH | MEDIUM | LOW] +--- +KEY_ISSUES: +- [Issue 1 or "None"] +- [Issue 2] +... +``` + +**Verdict meanings:** +- `APPROVE`: Plan is ready for human review +- `REQUEST_CHANGES`: Significant issues with approach or coverage +- `COMMENT`: Minor suggestions, plan is workable but could improve + +## Notes + +- The spec has already been approved - don't re-litigate spec decisions +- Focus on the quality of the plan as a guide for builders +- Consider: Would a builder be able to follow this plan successfully? +- If referencing existing code, verify file paths seem accurate diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-spec-review.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-spec-review.md.baseline new file mode 100644 index 000000000..7c9c1579b --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-spec-review.md.baseline @@ -0,0 +1,55 @@ +# Specification Review Prompt + +## Context +You are reviewing a feature specification during the Specify phase. Your role is to ensure the spec is complete, correct, and feasible before it moves to human approval. + +## Focus Areas + +1. **Completeness** + - Are all requirements clearly stated? + - Are success criteria defined? + - Are edge cases considered? + - Is scope well-bounded (not too broad or vague)? + +2. **Correctness** + - Do requirements make sense technically? + - Are there contradictions? + - Is the problem statement accurate? + +3. **Feasibility** + - Can this be implemented with available tools/constraints? + - Are there obvious technical blockers? + - Is the scope realistic for a single spec? + +4. **Clarity** + - Would a builder understand what to build? + - Are acceptance criteria testable? + - Is terminology consistent? + +## Verdict Format + +After your review, provide your verdict in exactly this format: + +``` +--- +VERDICT: [APPROVE | REQUEST_CHANGES | COMMENT] +SUMMARY: [One-line summary of your assessment] +CONFIDENCE: [HIGH | MEDIUM | LOW] +--- +KEY_ISSUES: +- [Issue 1 or "None"] +- [Issue 2] +... +``` + +**Verdict meanings:** +- `APPROVE`: Spec is ready for human review +- `REQUEST_CHANGES`: Significant issues must be fixed before proceeding +- `COMMENT`: Minor suggestions, can proceed but consider feedback + +## Notes + +- You are NOT reviewing code - you are reviewing the specification document +- Focus on WHAT is being built, not HOW it will be implemented (that's for plan review) +- Be constructive - identify issues AND suggest solutions +- If the spec references other specs, note if context seems missing diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-specify.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-specify.md.baseline new file mode 100644 index 000000000..6da1868f8 --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/aspir-specify.md.baseline @@ -0,0 +1,139 @@ +# SPECIFY Phase Prompt + +You are executing the **SPECIFY** phase of the SPIR protocol. + +## Your Goal + +Create a comprehensive specification document that thoroughly explores the problem space and proposed solution. + +## Context + +- **Project ID**: {{project_id}} +- **Project Title**: {{title}} +- **Current State**: {{current_state}} +- **Spec File**: `codev/specs/{{artifact_name}}.md` + +## Process + +### 0. Check for Existing Spec (ALWAYS DO THIS FIRST) + +**Before asking ANY questions**, check if a spec already exists: + +```bash +ls codev/specs/{{project_id}}-*.md +``` + +**If a spec file exists:** +1. READ IT COMPLETELY - the answers to your questions are already there +2. The spec author has already made the key decisions +3. DO NOT ask clarifying questions - proceed directly to consultation +4. Your job is to REVIEW and IMPROVE the existing spec, not rewrite it from scratch + +**If no spec exists:** Proceed to Step 1 below. + +### 1. Clarifying Questions (ONLY IF NO SPEC EXISTS) + +Before writing anything, ask clarifying questions to understand: +- What problem is being solved? +- Who are the stakeholders? +- What are the constraints? +- What's in scope vs out of scope? +- What does success look like? + +If this is your first iteration AND no spec exists, ask these questions now and wait for answers. + +**CRITICAL**: Do NOT ask questions if a spec already exists. The spec IS the answer. + +**On subsequent iterations**: If questions were already answered, acknowledge the answers and proceed to the next step. + +### 2. Problem Analysis + +Once you have answers, document: +- The problem being solved (clearly articulated) +- Current state vs desired state +- Stakeholders and their needs +- Assumptions and constraints + +### 3. Solution Exploration + +Generate multiple solution approaches. For each: +- Technical design overview +- Trade-offs (pros/cons) +- Complexity assessment +- Risk assessment + +### 4. Open Questions + +List uncertainties categorized as: +- **Critical** - blocks progress +- **Important** - affects design +- **Nice-to-know** - optimization + +### 5. Success Criteria + +Define measurable acceptance criteria: +- Functional requirements (MUST, SHOULD, COULD) +- Non-functional requirements (performance, security) +- Test scenarios + +### 6. Finalize + +After completing the spec draft, signal completion. Porch will run 3-way consultation (Gemini, Codex, Claude) automatically via the verify step. If reviewers request changes, you'll be respawned with their feedback. + +## Output + +Create or update the specification file at `codev/specs/{{artifact_name}}.md`. + +**IMPORTANT**: Keep spec/plan/review filenames in sync: +- Spec: `codev/specs/{{artifact_name}}.md` +- Plan: `codev/plans/{{artifact_name}}.md` +- Review: `codev/reviews/{{artifact_name}}.md` + +## Signals + +Emit appropriate signals based on your progress: + +- When waiting for clarifying question answers, **include your questions in the signal**: + ``` + + Please answer these questions: + 1. What should the primary use case be - internal tooling or customer-facing? + 2. What are the key constraints we should consider? + 3. Who are the main stakeholders? + + ``` + + The content inside the signal tag is displayed prominently to the user. + +- After completing the initial spec draft: + ``` + SPEC_DRAFTED + ``` + + +## Commit Cadence + +Make commits at these milestones: +1. `[Spec {{project_id}}] Initial specification draft` +2. `[Spec {{project_id}}] Specification with multi-agent review` +3. `[Spec {{project_id}}] Specification with user feedback` +4. `[Spec {{project_id}}] Final approved specification` + +**CRITICAL**: Never use `git add .` or `git add -A`. Always stage specific files: +```bash +git add codev/specs/{{artifact_name}}.md +``` + +## Important Notes + +1. **Be thorough** - A good spec prevents implementation problems +3. **Be specific** - Vague specs lead to wrong implementations +4. **Include examples** - Concrete examples clarify intent + +## What NOT to Do + +- Don't run `consult` commands yourself (porch handles consultations) +- Don't include implementation details (that's for the Plan phase) +- Don't estimate time (AI makes time estimates meaningless) +- Don't start coding (you're in Specify, not Implement) +- Don't use `git add .` or `git add -A` (security risk) diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-builder-prompt.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-builder-prompt.md.baseline new file mode 100644 index 000000000..c905b4696 --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-builder-prompt.md.baseline @@ -0,0 +1,75 @@ +# {{protocol_name}} Builder ({{mode}} mode) + +You are implementing {{input_description}}. + +{{#if mode_soft}} +## Mode: SOFT +You are running in SOFT mode. This means: +- You follow the protocol document yourself (no porch orchestration) +- The architect monitors your work and verifies you're adhering to the protocol +- Run consultations manually when the protocol calls for them +- You have flexibility in execution, but must stay compliant with the protocol +{{/if}} + +{{#if mode_strict}} +## Mode: STRICT +You are running in STRICT mode. This means: +- Porch orchestrates your work +- Run: `porch next` to get your next tasks +- Follow porch signals and gate approvals +- Do not deviate from the porch-driven workflow + +### ABSOLUTE RESTRICTIONS (STRICT MODE) +- **NEVER edit `status.yaml` directly** — only porch commands may modify project state +- **NEVER call `porch approve` without explicit human approval** — only run it after the architect says to +- **NEVER skip the 3-way review** — always follow porch next → porch done cycle +- **NEVER advance plan phases manually** — porch handles phase transitions after unanimous review approval +{{/if}} + +## Protocol +Follow the SPIR protocol: `codev/protocols/spir/protocol.md` +Read and internalize the protocol before starting any work. + +{{#if spec}} +## Spec +Read the specification at: `{{spec.path}}` +{{/if}} + +{{#if plan}} +## Plan +Follow the implementation plan at: `{{plan.path}}` +{{/if}} + +{{#if issue}} +## Issue #{{issue.number}} +**Title**: {{issue.title}} + +**Description**: +{{issue.body}} +{{/if}} + +{{#if task}} +## Task +{{task_text}} +{{/if}} + +## Notifications +Always use `afx send architect "..."` to notify the architect at key moments: +- **Gate reached**: `afx send architect "Project {{project_id}}: ready for approval"` +- **PR ready**: `afx send architect "PR #N ready for review (project {{project_id}})"` +- **PR merged**: `afx send architect "Project {{project_id}} complete. PR merged. Ready for cleanup."` +- **Blocked**: `afx send architect "Blocked on project {{project_id}}: [reason]"` + +## Handling Flaky Tests + +If you encounter **pre-existing flaky tests** (intermittent failures unrelated to your changes): +1. **DO NOT** edit `status.yaml` to bypass checks +2. **DO NOT** skip porch checks or use any workaround to avoid the failure +3. **DO** mark the test as skipped with a clear annotation (e.g., `it.skip('...') // FLAKY: skipped pending investigation`) +4. **DO** document each skipped flaky test in your review under a `## Flaky Tests` section +5. Commit the skip and continue with your work + +## Getting Started +1. Read the protocol document thoroughly +2. Review the spec and plan (if available) +3. Begin implementation following the protocol phases diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-plan-review.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-plan-review.md.baseline new file mode 100644 index 000000000..585085dec --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-plan-review.md.baseline @@ -0,0 +1,59 @@ +# Plan Review Prompt + +## Context +You are reviewing an implementation plan during the Plan phase. The spec has been approved - now you must evaluate whether the plan adequately describes HOW to implement it. + +## Focus Areas + +1. **Spec Coverage** + - Does the plan address all requirements in the spec? + - Are there spec requirements not covered by any phase? + - Are there phases that go beyond the spec scope? + +2. **Phase Breakdown** + - Are phases appropriately sized (not too large or too small)? + - Is the sequence logical (dependencies respected)? + - Can each phase be completed and committed independently? + +3. **Technical Approach** + - Is the implementation approach sound? + - Are the right files/modules being modified? + - Are there obvious better approaches being missed? + +4. **Testability** + - Does each phase have clear test criteria? + - Will the Defend step (writing tests) be feasible? + - Are edge cases from the spec addressable? + +5. **Risk Assessment** + - Are there potential blockers not addressed? + - Are dependencies on other systems identified? + - Is the plan realistic given constraints? + +## Verdict Format + +After your review, provide your verdict in exactly this format: + +``` +--- +VERDICT: [APPROVE | REQUEST_CHANGES | COMMENT] +SUMMARY: [One-line summary of your assessment] +CONFIDENCE: [HIGH | MEDIUM | LOW] +--- +KEY_ISSUES: +- [Issue 1 or "None"] +- [Issue 2] +... +``` + +**Verdict meanings:** +- `APPROVE`: Plan is ready for human review +- `REQUEST_CHANGES`: Significant issues with approach or coverage +- `COMMENT`: Minor suggestions, plan is workable but could improve + +## Notes + +- The spec has already been approved - don't re-litigate spec decisions +- Focus on the quality of the plan as a guide for builders +- Consider: Would a builder be able to follow this plan successfully? +- If referencing existing code, verify file paths seem accurate diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-spec-review.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-spec-review.md.baseline new file mode 100644 index 000000000..7c9c1579b --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-spec-review.md.baseline @@ -0,0 +1,55 @@ +# Specification Review Prompt + +## Context +You are reviewing a feature specification during the Specify phase. Your role is to ensure the spec is complete, correct, and feasible before it moves to human approval. + +## Focus Areas + +1. **Completeness** + - Are all requirements clearly stated? + - Are success criteria defined? + - Are edge cases considered? + - Is scope well-bounded (not too broad or vague)? + +2. **Correctness** + - Do requirements make sense technically? + - Are there contradictions? + - Is the problem statement accurate? + +3. **Feasibility** + - Can this be implemented with available tools/constraints? + - Are there obvious technical blockers? + - Is the scope realistic for a single spec? + +4. **Clarity** + - Would a builder understand what to build? + - Are acceptance criteria testable? + - Is terminology consistent? + +## Verdict Format + +After your review, provide your verdict in exactly this format: + +``` +--- +VERDICT: [APPROVE | REQUEST_CHANGES | COMMENT] +SUMMARY: [One-line summary of your assessment] +CONFIDENCE: [HIGH | MEDIUM | LOW] +--- +KEY_ISSUES: +- [Issue 1 or "None"] +- [Issue 2] +... +``` + +**Verdict meanings:** +- `APPROVE`: Spec is ready for human review +- `REQUEST_CHANGES`: Significant issues must be fixed before proceeding +- `COMMENT`: Minor suggestions, can proceed but consider feedback + +## Notes + +- You are NOT reviewing code - you are reviewing the specification document +- Focus on WHAT is being built, not HOW it will be implemented (that's for plan review) +- Be constructive - identify issues AND suggest solutions +- If the spec references other specs, note if context seems missing diff --git a/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-specify.md.baseline b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-specify.md.baseline new file mode 100644 index 000000000..6da1868f8 --- /dev/null +++ b/packages/codev/src/agent-farm/__tests__/fixtures/baselines/spir-specify.md.baseline @@ -0,0 +1,139 @@ +# SPECIFY Phase Prompt + +You are executing the **SPECIFY** phase of the SPIR protocol. + +## Your Goal + +Create a comprehensive specification document that thoroughly explores the problem space and proposed solution. + +## Context + +- **Project ID**: {{project_id}} +- **Project Title**: {{title}} +- **Current State**: {{current_state}} +- **Spec File**: `codev/specs/{{artifact_name}}.md` + +## Process + +### 0. Check for Existing Spec (ALWAYS DO THIS FIRST) + +**Before asking ANY questions**, check if a spec already exists: + +```bash +ls codev/specs/{{project_id}}-*.md +``` + +**If a spec file exists:** +1. READ IT COMPLETELY - the answers to your questions are already there +2. The spec author has already made the key decisions +3. DO NOT ask clarifying questions - proceed directly to consultation +4. Your job is to REVIEW and IMPROVE the existing spec, not rewrite it from scratch + +**If no spec exists:** Proceed to Step 1 below. + +### 1. Clarifying Questions (ONLY IF NO SPEC EXISTS) + +Before writing anything, ask clarifying questions to understand: +- What problem is being solved? +- Who are the stakeholders? +- What are the constraints? +- What's in scope vs out of scope? +- What does success look like? + +If this is your first iteration AND no spec exists, ask these questions now and wait for answers. + +**CRITICAL**: Do NOT ask questions if a spec already exists. The spec IS the answer. + +**On subsequent iterations**: If questions were already answered, acknowledge the answers and proceed to the next step. + +### 2. Problem Analysis + +Once you have answers, document: +- The problem being solved (clearly articulated) +- Current state vs desired state +- Stakeholders and their needs +- Assumptions and constraints + +### 3. Solution Exploration + +Generate multiple solution approaches. For each: +- Technical design overview +- Trade-offs (pros/cons) +- Complexity assessment +- Risk assessment + +### 4. Open Questions + +List uncertainties categorized as: +- **Critical** - blocks progress +- **Important** - affects design +- **Nice-to-know** - optimization + +### 5. Success Criteria + +Define measurable acceptance criteria: +- Functional requirements (MUST, SHOULD, COULD) +- Non-functional requirements (performance, security) +- Test scenarios + +### 6. Finalize + +After completing the spec draft, signal completion. Porch will run 3-way consultation (Gemini, Codex, Claude) automatically via the verify step. If reviewers request changes, you'll be respawned with their feedback. + +## Output + +Create or update the specification file at `codev/specs/{{artifact_name}}.md`. + +**IMPORTANT**: Keep spec/plan/review filenames in sync: +- Spec: `codev/specs/{{artifact_name}}.md` +- Plan: `codev/plans/{{artifact_name}}.md` +- Review: `codev/reviews/{{artifact_name}}.md` + +## Signals + +Emit appropriate signals based on your progress: + +- When waiting for clarifying question answers, **include your questions in the signal**: + ``` + + Please answer these questions: + 1. What should the primary use case be - internal tooling or customer-facing? + 2. What are the key constraints we should consider? + 3. Who are the main stakeholders? + + ``` + + The content inside the signal tag is displayed prominently to the user. + +- After completing the initial spec draft: + ``` + SPEC_DRAFTED + ``` + + +## Commit Cadence + +Make commits at these milestones: +1. `[Spec {{project_id}}] Initial specification draft` +2. `[Spec {{project_id}}] Specification with multi-agent review` +3. `[Spec {{project_id}}] Specification with user feedback` +4. `[Spec {{project_id}}] Final approved specification` + +**CRITICAL**: Never use `git add .` or `git add -A`. Always stage specific files: +```bash +git add codev/specs/{{artifact_name}}.md +``` + +## Important Notes + +1. **Be thorough** - A good spec prevents implementation problems +3. **Be specific** - Vague specs lead to wrong implementations +4. **Include examples** - Concrete examples clarify intent + +## What NOT to Do + +- Don't run `consult` commands yourself (porch handles consultations) +- Don't include implementation details (that's for the Plan phase) +- Don't estimate time (AI makes time estimates meaningless) +- Don't start coding (you're in Specify, not Implement) +- Don't use `git add .` or `git add -A` (security risk)