Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .agents/skills/grill-me/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
name: grill-me
description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
---

Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.

Ask the questions one at a time.

If a question can be answered by exploring the codebase, explore the codebase instead.
47 changes: 47 additions & 0 deletions .agents/skills/grill-with-docs/ADR-FORMAT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# ADR Format

ADRs live in `docs/adr/` and use sequential numbering: `0001-slug.md`, `0002-slug.md`, etc.

Create the `docs/adr/` directory lazily — only when the first ADR is needed.

## Template

```md
# {Short title of the decision}

{1-3 sentences: what's the context, what did we decide, and why.}
```

That's it. An ADR can be a single paragraph. The value is in recording *that* a decision was made and *why* — not in filling out sections.

## Optional sections

Only include these when they add genuine value. Most ADRs won't need them.

- **Status** frontmatter (`proposed | accepted | deprecated | superseded by ADR-NNNN`) — useful when decisions are revisited
- **Considered Options** — only when the rejected alternatives are worth remembering
- **Consequences** — only when non-obvious downstream effects need to be called out

## Numbering

Scan `docs/adr/` for the highest existing number and increment by one.

## When to offer an ADR

All three of these must be true:

1. **Hard to reverse** — the cost of changing your mind later is meaningful
2. **Surprising without context** — a future reader will look at the code and wonder "why on earth did they do it this way?"
3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons

If a decision is easy to reverse, skip it — you'll just reverse it. If it's not surprising, nobody will wonder why. If there was no real alternative, there's nothing to record beyond "we did the obvious thing."

### What qualifies

- **Architectural shape.** "We're using a monorepo." "The write model is event-sourced, the read model is projected into Postgres."
- **Integration patterns between contexts.** "Ordering and Billing communicate via domain events, not synchronous HTTP."
- **Technology choices that carry lock-in.** Database, message bus, auth provider, deployment target. Not every library — just the ones that would take a quarter to swap out.
- **Boundary and scope decisions.** "Customer data is owned by the Customer context; other contexts reference it by ID only." The explicit no-s are as valuable as the yes-s.
- **Deliberate deviations from the obvious path.** "We're using manual SQL instead of an ORM because X." Anything where a reasonable reader would assume the opposite. These stop the next engineer from "fixing" something that was deliberate.
- **Constraints not visible in the code.** "We can't use AWS because of compliance requirements." "Response times must be under 200ms because of the partner API contract."
- **Rejected alternatives when the rejection is non-obvious.** If you considered GraphQL and picked REST for subtle reasons, record it — otherwise someone will suggest GraphQL again in six months.
77 changes: 77 additions & 0 deletions .agents/skills/grill-with-docs/CONTEXT-FORMAT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# CONTEXT.md Format

## Structure

```md
# {Context Name}

{One or two sentence description of what this context is and why it exists.}

## Language

**Order**:
{A concise description of the term}
_Avoid_: Purchase, transaction

**Invoice**:
A request for payment sent to a customer after delivery.
_Avoid_: Bill, payment request

**Customer**:
A person or organization that places orders.
_Avoid_: Client, buyer, account

## Relationships

- An **Order** produces one or more **Invoices**
- An **Invoice** belongs to exactly one **Customer**

## Example dialogue

> **Dev:** "When a **Customer** places an **Order**, do we create the **Invoice** immediately?"
> **Domain expert:** "No — an **Invoice** is only generated once a **Fulfillment** is confirmed."

## Flagged ambiguities

- "account" was used to mean both **Customer** and **User** — resolved: these are distinct concepts.
```

## Rules

- **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others as aliases to avoid.
- **Flag conflicts explicitly.** If a term is used ambiguously, call it out in "Flagged ambiguities" with a clear resolution.
- **Keep definitions tight.** One sentence max. Define what it IS, not what it does.
- **Show relationships.** Use bold term names and express cardinality where obvious.
- **Only include terms specific to this project's context.** General programming concepts (timeouts, error types, utility patterns) don't belong even if the project uses them extensively. Before adding a term, ask: is this a concept unique to this context, or a general programming concept? Only the former belongs.
- **Group terms under subheadings** when natural clusters emerge. If all terms belong to a single cohesive area, a flat list is fine.
- **Write an example dialogue.** A conversation between a dev and a domain expert that demonstrates how the terms interact naturally and clarifies boundaries between related concepts.

## Single vs multi-context repos

**Single context (most repos):** One `CONTEXT.md` at the repo root.

**Multiple contexts:** A `CONTEXT-MAP.md` at the repo root lists the contexts, where they live, and how they relate to each other:

```md
# Context Map

## Contexts

- [Ordering](./src/ordering/CONTEXT.md) — receives and tracks customer orders
- [Billing](./src/billing/CONTEXT.md) — generates invoices and processes payments
- [Fulfillment](./src/fulfillment/CONTEXT.md) — manages warehouse picking and shipping

## Relationships

- **Ordering → Fulfillment**: Ordering emits `OrderPlaced` events; Fulfillment consumes them to start picking
- **Fulfillment → Billing**: Fulfillment emits `ShipmentDispatched` events; Billing consumes them to generate invoices
- **Ordering ↔ Billing**: Shared types for `CustomerId` and `Money`
```

The skill infers which structure applies:

- If `CONTEXT-MAP.md` exists, read it to find contexts
- If only a root `CONTEXT.md` exists, single context
- If neither exists, create a root `CONTEXT.md` lazily when the first term is resolved

When multiple contexts exist, infer which one the current topic relates to. If unclear, ask.
88 changes: 88 additions & 0 deletions .agents/skills/grill-with-docs/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
name: grill-with-docs
description: Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions.
---

<what-to-do>

Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.

Ask the questions one at a time, waiting for feedback on each question before continuing.

If a question can be answered by exploring the codebase, explore the codebase instead.

</what-to-do>

<supporting-info>

## Domain awareness

During codebase exploration, also look for existing documentation:

### File structure

Most repos have a single context:

```
/
├── CONTEXT.md
├── docs/
│ └── adr/
│ ├── 0001-event-sourced-orders.md
│ └── 0002-postgres-for-write-model.md
└── src/
```

If a `CONTEXT-MAP.md` exists at the root, the repo has multiple contexts. The map points to where each one lives:

```
/
├── CONTEXT-MAP.md
├── docs/
│ └── adr/ ← system-wide decisions
├── src/
│ ├── ordering/
│ │ ├── CONTEXT.md
│ │ └── docs/adr/ ← context-specific decisions
│ └── billing/
│ ├── CONTEXT.md
│ └── docs/adr/
```

Create files lazily — only when you have something to write. If no `CONTEXT.md` exists, create one when the first term is resolved. If no `docs/adr/` exists, create it when the first ADR is needed.

## During the session

### Challenge against the glossary

When the user uses a term that conflicts with the existing language in `CONTEXT.md`, call it out immediately. "Your glossary defines 'cancellation' as X, but you seem to mean Y — which is it?"

### Sharpen fuzzy language

When the user uses vague or overloaded terms, propose a precise canonical term. "You're saying 'account' — do you mean the Customer or the User? Those are different things."

### Discuss concrete scenarios

When domain relationships are being discussed, stress-test them with specific scenarios. Invent scenarios that probe edge cases and force the user to be precise about the boundaries between concepts.

### Cross-reference with code

When the user states how something works, check whether the code agrees. If you find a contradiction, surface it: "Your code cancels entire Orders, but you just said partial cancellation is possible — which is right?"

### Update CONTEXT.md inline

When a term is resolved, update `CONTEXT.md` right there. Don't batch these up — capture them as they happen. Use the format in [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md).

Don't couple `CONTEXT.md` to implementation details. Only include terms that are meaningful to domain experts.

### Offer ADRs sparingly

Only offer to create an ADR when all three are true:

1. **Hard to reverse** — the cost of changing your mind later is meaningful
2. **Surprising without context** — a future reader will wonder "why did they do it this way?"
3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons

If any of the three is missing, skip the ADR. Use the format in [ADR-FORMAT.md](./ADR-FORMAT.md).

</supporting-info>
100 changes: 100 additions & 0 deletions .codex/ledger/2026-05-06-MEMORY-agent-category-path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
Goal (incl. success criteria):

- Implement `.compozy/tasks/agent-categories/_techspec.md` end to end for AGENT.md agent categories.
- Success requires: TechSpec reformatted to the official template with stronger tests, backend/contract/docs implemented, web/UI implemented through Compozy + Claude Opus, peer review loop completed, QA report generated by Opus, QA execution completed locally, PR opened as `feat: agent categories`, and CodeRabbit review watch started.

Constraints/Assumptions:

- Artifacts/docs/code in English; conversation in BR-PT.
- User explicitly requested Compozy/Claude Opus before implementation to improve test scenarios and reformat `_techspec.md`.
- User explicitly requested Compozy/Claude Opus for web/UI implementation.
- `category_path` is display-only metadata; no runtime behavior changes unless TechSpec review changes this.
- Use `packages/ui/src/components/reui/tree.tsx` and `packages/ui/src/components/command.tsx` for web UI.
- Pre-existing dirty worktree includes site landing files, packages/ui/web package files, and untracked UI components; do not revert user changes.
- Must use `rtk` for shell commands. No destructive git commands.

Key decisions:

- Canonical field from accepted direction: `category_path: ["Marketing", "Sales"]`.
- No aliases, slash-string format, multi-tagging, backend tree endpoint, DB migration, or `config.toml` key unless TechSpec hardening surfaces a root-cause need.
- Agents with missing/empty `category_path` remain root-level UI items.
- Add `@headless-tree/react` via package manager if needed, matching existing `@headless-tree/core`.

State:

- CODEX_LOOP active; implementation is following the hardened `.compozy/tasks/agent-categories/_techspec.md`.
- Opus implementation peer-review loop is complete. Local QA execution is in progress against an isolated lab.

Done:

- Goal registered with `functions.create_goal`.
- Read RTK, current ledger, and explicit skills: `compozy`, `no-workarounds`, `cy-impl-peer-review`, `qa-report`, `qa-execution`, `codex-loop`.
- Asked Claude Code Opus via `compozy exec` to reformat/harden `.compozy/tasks/agent-categories/_techspec.md`.
- Opus rewrote the TechSpec in the full template and expanded Go/Web/CLI/API/bundle/native/QA test scenarios.
- Implemented backend/config/API/CLI/docs-source propagation for `AgentDef.CategoryPath`.
- Ran `make codegen`, `make codegen-check`, and targeted Go tests for config/workspace/api/core/cli/e2e helpers.
- Ran `make cli-docs`; reverted generator-only CLI reference formatting noise because no CLI help source changed.
- Delegated web/UI implementation to Claude Code Opus via Compozy.
- Removed out-of-scope Opus edits to root/web/site instruction files and the untracked `impeccable` skill directory.
- Reviewed Opus UI implementation and fixed the loading-to-loaded tree expansion lifecycle plus duplicated folder chevrons.
- Web checks passed: focused agent UI tests, `make web-lint`, `make web-typecheck`, and `make web-test`.
- Fixed `make verify` Go lint failures (`funlen` in `internal/config/agent_edit.go`, unused `cloneMCPServer` in `internal/extension/manager.go`).
- Full `make verify` passed after the lint fixes.
- Opus peer-review round 1 artifact directory moved to `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/` because `.peer-reviews/*.json` JSONL artifacts are picked up by `oxfmt`.
- Locally fixed Go nits from peer review: direct category validation, normalized edit round-trip, `agentCategoryLabel` coverage/comment.
- Opus UI remediation addressed case-sensitive category folders, tree aria typing/leaf expansion, sidebar/selector stories, sidebar integration test, route-change expansion test, and Playwright coverage.
- Post-remediation checks passed: `make web-lint`, `make web-typecheck`, `make web-test` (243 files / 1809 tests), `go test ./internal/config ./internal/cli -count=1`, and full `make verify`.
- Opus peer-review round 2 returned `SHIP` with no blockers. Locally fixed the actionable risks/nits: settings trigger test ID, Tree package story/test, exact `@headless-tree` pins, Go clone/normalize comments, category folder invariant, network dialog full-selection callback, route mock fidelity, and workspace human output assertion.
- Verification after round 2 fixes passed: targeted Vitest (4 files / 34 tests), `make bun-typecheck`, `make bun-test` (372 files / 2397 tests), `go test ./internal/config ./internal/cli -count=1`, and full `make verify`.
- Opus peer-review round 3 returned `SHIP` with 0 blockers / 0 risks / 3 nits. Fixed all 3 nits: TOON shape comment, removed extension clone wrapper, and made Tree optional-feature test cover no selection feature.
- Opus peer-review round 4 returned `SHIP` with 0 blockers / 0 risks / 0 nits. Full `make verify` passed before the round.
- Opus QA report artifacts were generated under `.compozy/tasks/agent-categories/qa/`.
- Isolated QA lab was created under `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/` with `AGH_HOME=/var/folders/7x/xg204hnd04b81fczcxvjlhzr0000gn/T/aghqa-6d7ced711656/runtime` and daemon port `56440`.
- QA execution found and fixed a real web route bug: agent detail fetches now include the active workspace, matching the sidebar's workspace-scoped agent list.
- QA evidence collected so far covers CLI/API/UDS/native agent listing/detail, invalid segment diagnostics, daemon restart persistence, provider-backed session prompt, and browser sidebar/selector behavior.
- Focused web checks after the route fix passed: `cd web && bun run test:raw src/systems/agent/adapters/agent-api.test.ts src/hooks/routes/use-agent-detail-page.test.tsx`, `make web-lint`, `make web-typecheck`, and `cd web && bun run test:e2e:daemon-served:raw agent-categories.spec.ts`.
- `make test-e2e-runtime` initially failed to compile `internal/daemon/daemon_network_collaboration_integration_test.go` because a `networkAuditExpectation` literal still passed string values to `*string` fields. The test now uses `auditFieldValue` for `Surface` and `ThreadID`; the runtime e2e gate is rerunning.
- The rerun surfaced a deterministic `TestDaemonE2EACPmockPermissionDisconnectProjectsRuntimeFailure` timeout. Root cause was the acpmock fault fixture sequencing a synchronous permission request before `driver_control.disconnect`, so the disconnect step was unreachable while permission was pending. Fixed the fixture by scheduling an async delayed disconnect before the permission step.
- Runtime e2e verification now passes: `make test-e2e-runtime` (daemon, httpapi, udsapi, testutil/e2e lanes).
- Browser-side e2e verification now passes: `make test-e2e-web` (21 Playwright tests). Updated stale e2e assertions to match command-picker trigger semantics and the current network channel header.
- Final pre-commit `make verify` passed and was captured at `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/evidence/final-make-verify.log`.
- QA execution artifacts were normalized for the strict auditor, auditor passed with exit code 0, and final artifacts were copied into `.compozy/tasks/agent-categories/qa/`.
- The isolated QA daemon was stopped. The provider-backed session stop command raced after daemon stop and failed to connect to the socket, but daemon shutdown reported `Active Sessions: 0` and the foreground daemon session exited cleanly.
- Commit `25e6fd61` (`feat: agent categories`) was created and pushed to `origin/agent-categories`.
- Post-commit `make verify` passed.
- PR #113 was opened: `https://github.com/compozy/agh/pull/113`.
- Requested CodeRabbit watch command was run for PR #113. It started background run `reviews-agent-categories-266f3f-round-000-20260506-202459-000000000`; GitHub shows CodeRabbit status pending and "review in progress".

Now:

- Final response and goal closure.

Next:

- None.

Open questions (UNCONFIRMED if needed):

- None.

Working set (files/ids/commands):

- TechSpec: `.compozy/tasks/agent-categories/_techspec.md`
- Template: `.agents/skills/cy-create-techspec/references/techspec-template.md`
- Prior plan: `.codex/plans/2026-05-06-agent-category-path.md`
- Ledger: `.codex/ledger/2026-05-06-MEMORY-agent-category-path.md`
- Opus hardening prompt: `.compozy/tasks/agent-categories/opus-techspec-hardening-prompt.md`
- Opus UI prompt: `.compozy/tasks/agent-categories/opus-ui-implementation-prompt.md`
- Targeted Go verification: `go test ./internal/config ./internal/workspace ./internal/api/core ./internal/cli ./internal/testutil/e2e -count=1`
- Web verification: `make web-lint`; `make web-typecheck`; `make web-test`
- Full verification: `make verify`
- Peer review round 1: `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/impl-review-result-round1.json`
- Peer review final: `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/impl-review-final-round4.pretty.json`
- QA manifest: `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/bootstrap-manifest.json`
- QA daemon session id: `42578`
- Provider-backed QA session id: `sess-78095017870b2ac0`
- QA audit: `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/qa-audit-report.json`
- Committed QA artifact root: `.compozy/tasks/agent-categories/qa/`
- Commit: `25e6fd61`
- PR: `#113`
- CodeRabbit watch run: `reviews-agent-categories-266f3f-round-000-20260506-202459-000000000`
Loading
Loading