compozy · pedronauck · May 6, 2026 · May 6, 2026 · May 6, 2026 · May 6, 2026
diff --git a/.agents/skills/grill-me/SKILL.md b/.agents/skills/grill-me/SKILL.md
@@ -0,0 +1,10 @@
+---
+name: grill-me
+description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
+---
+
+Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
+
+Ask the questions one at a time.
+
+If a question can be answered by exploring the codebase, explore the codebase instead.
diff --git a/.agents/skills/grill-with-docs/ADR-FORMAT.md b/.agents/skills/grill-with-docs/ADR-FORMAT.md
@@ -0,0 +1,47 @@
+# ADR Format
+
+ADRs live in `docs/adr/` and use sequential numbering: `0001-slug.md`, `0002-slug.md`, etc.
+
+Create the `docs/adr/` directory lazily — only when the first ADR is needed.
+
+## Template
+
+```md
+# {Short title of the decision}
+
+{1-3 sentences: what's the context, what did we decide, and why.}
+```
+
+That's it. An ADR can be a single paragraph. The value is in recording *that* a decision was made and *why* — not in filling out sections.
+
+## Optional sections
+
+Only include these when they add genuine value. Most ADRs won't need them.
+
+- **Status** frontmatter (`proposed | accepted | deprecated | superseded by ADR-NNNN`) — useful when decisions are revisited
+- **Considered Options** — only when the rejected alternatives are worth remembering
+- **Consequences** — only when non-obvious downstream effects need to be called out
+
+## Numbering
+
+Scan `docs/adr/` for the highest existing number and increment by one.
+
+## When to offer an ADR
+
+All three of these must be true:
+
+1. **Hard to reverse** — the cost of changing your mind later is meaningful
+2. **Surprising without context** — a future reader will look at the code and wonder "why on earth did they do it this way?"
+3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons
+
+If a decision is easy to reverse, skip it — you'll just reverse it. If it's not surprising, nobody will wonder why. If there was no real alternative, there's nothing to record beyond "we did the obvious thing."
+
+### What qualifies
+
+- **Architectural shape.** "We're using a monorepo." "The write model is event-sourced, the read model is projected into Postgres."
+- **Integration patterns between contexts.** "Ordering and Billing communicate via domain events, not synchronous HTTP."
+- **Technology choices that carry lock-in.** Database, message bus, auth provider, deployment target. Not every library — just the ones that would take a quarter to swap out.
+- **Boundary and scope decisions.** "Customer data is owned by the Customer context; other contexts reference it by ID only." The explicit no-s are as valuable as the yes-s.
+- **Deliberate deviations from the obvious path.** "We're using manual SQL instead of an ORM because X." Anything where a reasonable reader would assume the opposite. These stop the next engineer from "fixing" something that was deliberate.
+- **Constraints not visible in the code.** "We can't use AWS because of compliance requirements." "Response times must be under 200ms because of the partner API contract."
+- **Rejected alternatives when the rejection is non-obvious.** If you considered GraphQL and picked REST for subtle reasons, record it — otherwise someone will suggest GraphQL again in six months.
diff --git a/.agents/skills/grill-with-docs/CONTEXT-FORMAT.md b/.agents/skills/grill-with-docs/CONTEXT-FORMAT.md
@@ -0,0 +1,77 @@
+# CONTEXT.md Format
+
+## Structure
+
+```md
+# {Context Name}
+
+{One or two sentence description of what this context is and why it exists.}
+
+## Language
+
+**Order**:
+{A concise description of the term}
+_Avoid_: Purchase, transaction
+
+**Invoice**:
+A request for payment sent to a customer after delivery.
+_Avoid_: Bill, payment request
+
+**Customer**:
+A person or organization that places orders.
+_Avoid_: Client, buyer, account
+
+## Relationships
+
+- An **Order** produces one or more **Invoices**
+- An **Invoice** belongs to exactly one **Customer**
+
+## Example dialogue
+
+> **Dev:** "When a **Customer** places an **Order**, do we create the **Invoice** immediately?"
+> **Domain expert:** "No — an **Invoice** is only generated once a **Fulfillment** is confirmed."
+
+## Flagged ambiguities
+
+- "account" was used to mean both **Customer** and **User** — resolved: these are distinct concepts.
+```
+
+## Rules
+
+- **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others as aliases to avoid.
+- **Flag conflicts explicitly.** If a term is used ambiguously, call it out in "Flagged ambiguities" with a clear resolution.
+- **Keep definitions tight.** One sentence max. Define what it IS, not what it does.
+- **Show relationships.** Use bold term names and express cardinality where obvious.
+- **Only include terms specific to this project's context.** General programming concepts (timeouts, error types, utility patterns) don't belong even if the project uses them extensively. Before adding a term, ask: is this a concept unique to this context, or a general programming concept? Only the former belongs.
+- **Group terms under subheadings** when natural clusters emerge. If all terms belong to a single cohesive area, a flat list is fine.
+- **Write an example dialogue.** A conversation between a dev and a domain expert that demonstrates how the terms interact naturally and clarifies boundaries between related concepts.
+
+## Single vs multi-context repos
+
+**Single context (most repos):** One `CONTEXT.md` at the repo root.
+
+**Multiple contexts:** A `CONTEXT-MAP.md` at the repo root lists the contexts, where they live, and how they relate to each other:
+
+```md
+# Context Map
+
+## Contexts
+
+- [Ordering](./src/ordering/CONTEXT.md) — receives and tracks customer orders
+- [Billing](./src/billing/CONTEXT.md) — generates invoices and processes payments
+- [Fulfillment](./src/fulfillment/CONTEXT.md) — manages warehouse picking and shipping
+
+## Relationships
+
+- **Ordering → Fulfillment**: Ordering emits `OrderPlaced` events; Fulfillment consumes them to start picking
+- **Fulfillment → Billing**: Fulfillment emits `ShipmentDispatched` events; Billing consumes them to generate invoices
+- **Ordering ↔ Billing**: Shared types for `CustomerId` and `Money`
+```
+
+The skill infers which structure applies:
+
+- If `CONTEXT-MAP.md` exists, read it to find contexts
+- If only a root `CONTEXT.md` exists, single context
+- If neither exists, create a root `CONTEXT.md` lazily when the first term is resolved
+
+When multiple contexts exist, infer which one the current topic relates to. If unclear, ask.
diff --git a/.agents/skills/grill-with-docs/SKILL.md b/.agents/skills/grill-with-docs/SKILL.md
@@ -0,0 +1,88 @@
+---
+name: grill-with-docs
+description: Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions.
+---
+
+<what-to-do>
+
+Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
+
+Ask the questions one at a time, waiting for feedback on each question before continuing.
+
+If a question can be answered by exploring the codebase, explore the codebase instead.
+
+</what-to-do>
+
+<supporting-info>
+
+## Domain awareness
+
+During codebase exploration, also look for existing documentation:
+
+### File structure
+
+Most repos have a single context:
+
+```
+/
+├── CONTEXT.md
+├── docs/
+│   └── adr/
+│       ├── 0001-event-sourced-orders.md
+│       └── 0002-postgres-for-write-model.md
+└── src/
+```
+
+If a `CONTEXT-MAP.md` exists at the root, the repo has multiple contexts. The map points to where each one lives:
+
+```
+/
+├── CONTEXT-MAP.md
+├── docs/
+│   └── adr/                          ← system-wide decisions
+├── src/
+│   ├── ordering/
+│   │   ├── CONTEXT.md
+│   │   └── docs/adr/                 ← context-specific decisions
+│   └── billing/
+│       ├── CONTEXT.md
+│       └── docs/adr/
+```
+
+Create files lazily — only when you have something to write. If no `CONTEXT.md` exists, create one when the first term is resolved. If no `docs/adr/` exists, create it when the first ADR is needed.
+
+## During the session
+
+### Challenge against the glossary
+
+When the user uses a term that conflicts with the existing language in `CONTEXT.md`, call it out immediately. "Your glossary defines 'cancellation' as X, but you seem to mean Y — which is it?"
+
+### Sharpen fuzzy language
+
+When the user uses vague or overloaded terms, propose a precise canonical term. "You're saying 'account' — do you mean the Customer or the User? Those are different things."
+
+### Discuss concrete scenarios
+
+When domain relationships are being discussed, stress-test them with specific scenarios. Invent scenarios that probe edge cases and force the user to be precise about the boundaries between concepts.
+
+### Cross-reference with code
+
+When the user states how something works, check whether the code agrees. If you find a contradiction, surface it: "Your code cancels entire Orders, but you just said partial cancellation is possible — which is right?"
+
+### Update CONTEXT.md inline
+
+When a term is resolved, update `CONTEXT.md` right there. Don't batch these up — capture them as they happen. Use the format in [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md).
+
+Don't couple `CONTEXT.md` to implementation details. Only include terms that are meaningful to domain experts.
+
+### Offer ADRs sparingly
+
+Only offer to create an ADR when all three are true:
+
+1. **Hard to reverse** — the cost of changing your mind later is meaningful
+2. **Surprising without context** — a future reader will wonder "why did they do it this way?"
+3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons
+
+If any of the three is missing, skip the ADR. Use the format in [ADR-FORMAT.md](./ADR-FORMAT.md).
+
+</supporting-info>
diff --git a/.codex/ledger/2026-05-06-MEMORY-agent-category-path.md b/.codex/ledger/2026-05-06-MEMORY-agent-category-path.md
@@ -0,0 +1,100 @@
+Goal (incl. success criteria):
+
+- Implement `.compozy/tasks/agent-categories/_techspec.md` end to end for AGENT.md agent categories.
+- Success requires: TechSpec reformatted to the official template with stronger tests, backend/contract/docs implemented, web/UI implemented through Compozy + Claude Opus, peer review loop completed, QA report generated by Opus, QA execution completed locally, PR opened as `feat: agent categories`, and CodeRabbit review watch started.
+
+Constraints/Assumptions:
+
+- Artifacts/docs/code in English; conversation in BR-PT.
+- User explicitly requested Compozy/Claude Opus before implementation to improve test scenarios and reformat `_techspec.md`.
+- User explicitly requested Compozy/Claude Opus for web/UI implementation.
+- `category_path` is display-only metadata; no runtime behavior changes unless TechSpec review changes this.
+- Use `packages/ui/src/components/reui/tree.tsx` and `packages/ui/src/components/command.tsx` for web UI.
+- Pre-existing dirty worktree includes site landing files, packages/ui/web package files, and untracked UI components; do not revert user changes.
+- Must use `rtk` for shell commands. No destructive git commands.
+
+Key decisions:
+
+- Canonical field from accepted direction: `category_path: ["Marketing", "Sales"]`.
+- No aliases, slash-string format, multi-tagging, backend tree endpoint, DB migration, or `config.toml` key unless TechSpec hardening surfaces a root-cause need.
+- Agents with missing/empty `category_path` remain root-level UI items.
+- Add `@headless-tree/react` via package manager if needed, matching existing `@headless-tree/core`.
+
+State:
+
+- CODEX_LOOP active; implementation is following the hardened `.compozy/tasks/agent-categories/_techspec.md`.
+- Opus implementation peer-review loop is complete. Local QA execution is in progress against an isolated lab.
+
+Done:
+
+- Goal registered with `functions.create_goal`.
+- Read RTK, current ledger, and explicit skills: `compozy`, `no-workarounds`, `cy-impl-peer-review`, `qa-report`, `qa-execution`, `codex-loop`.
+- Asked Claude Code Opus via `compozy exec` to reformat/harden `.compozy/tasks/agent-categories/_techspec.md`.
+- Opus rewrote the TechSpec in the full template and expanded Go/Web/CLI/API/bundle/native/QA test scenarios.
+- Implemented backend/config/API/CLI/docs-source propagation for `AgentDef.CategoryPath`.
+- Ran `make codegen`, `make codegen-check`, and targeted Go tests for config/workspace/api/core/cli/e2e helpers.
+- Ran `make cli-docs`; reverted generator-only CLI reference formatting noise because no CLI help source changed.
+- Delegated web/UI implementation to Claude Code Opus via Compozy.
+- Removed out-of-scope Opus edits to root/web/site instruction files and the untracked `impeccable` skill directory.
+- Reviewed Opus UI implementation and fixed the loading-to-loaded tree expansion lifecycle plus duplicated folder chevrons.
+- Web checks passed: focused agent UI tests, `make web-lint`, `make web-typecheck`, and `make web-test`.
+- Fixed `make verify` Go lint failures (`funlen` in `internal/config/agent_edit.go`, unused `cloneMCPServer` in `internal/extension/manager.go`).
+- Full `make verify` passed after the lint fixes.
+- Opus peer-review round 1 artifact directory moved to `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/` because `.peer-reviews/*.json` JSONL artifacts are picked up by `oxfmt`.
+- Locally fixed Go nits from peer review: direct category validation, normalized edit round-trip, `agentCategoryLabel` coverage/comment.
+- Opus UI remediation addressed case-sensitive category folders, tree aria typing/leaf expansion, sidebar/selector stories, sidebar integration test, route-change expansion test, and Playwright coverage.
+- Post-remediation checks passed: `make web-lint`, `make web-typecheck`, `make web-test` (243 files / 1809 tests), `go test ./internal/config ./internal/cli -count=1`, and full `make verify`.
+- Opus peer-review round 2 returned `SHIP` with no blockers. Locally fixed the actionable risks/nits: settings trigger test ID, Tree package story/test, exact `@headless-tree` pins, Go clone/normalize comments, category folder invariant, network dialog full-selection callback, route mock fidelity, and workspace human output assertion.
+- Verification after round 2 fixes passed: targeted Vitest (4 files / 34 tests), `make bun-typecheck`, `make bun-test` (372 files / 2397 tests), `go test ./internal/config ./internal/cli -count=1`, and full `make verify`.
+- Opus peer-review round 3 returned `SHIP` with 0 blockers / 0 risks / 3 nits. Fixed all 3 nits: TOON shape comment, removed extension clone wrapper, and made Tree optional-feature test cover no selection feature.
+- Opus peer-review round 4 returned `SHIP` with 0 blockers / 0 risks / 0 nits. Full `make verify` passed before the round.
+- Opus QA report artifacts were generated under `.compozy/tasks/agent-categories/qa/`.
+- Isolated QA lab was created under `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/` with `AGH_HOME=/var/folders/7x/xg204hnd04b81fczcxvjlhzr0000gn/T/aghqa-6d7ced711656/runtime` and daemon port `56440`.
+- QA execution found and fixed a real web route bug: agent detail fetches now include the active workspace, matching the sidebar's workspace-scoped agent list.
+- QA evidence collected so far covers CLI/API/UDS/native agent listing/detail, invalid segment diagnostics, daemon restart persistence, provider-backed session prompt, and browser sidebar/selector behavior.
+- Focused web checks after the route fix passed: `cd web && bun run test:raw src/systems/agent/adapters/agent-api.test.ts src/hooks/routes/use-agent-detail-page.test.tsx`, `make web-lint`, `make web-typecheck`, and `cd web && bun run test:e2e:daemon-served:raw agent-categories.spec.ts`.
+- `make test-e2e-runtime` initially failed to compile `internal/daemon/daemon_network_collaboration_integration_test.go` because a `networkAuditExpectation` literal still passed string values to `*string` fields. The test now uses `auditFieldValue` for `Surface` and `ThreadID`; the runtime e2e gate is rerunning.
+- The rerun surfaced a deterministic `TestDaemonE2EACPmockPermissionDisconnectProjectsRuntimeFailure` timeout. Root cause was the acpmock fault fixture sequencing a synchronous permission request before `driver_control.disconnect`, so the disconnect step was unreachable while permission was pending. Fixed the fixture by scheduling an async delayed disconnect before the permission step.
+- Runtime e2e verification now passes: `make test-e2e-runtime` (daemon, httpapi, udsapi, testutil/e2e lanes).
+- Browser-side e2e verification now passes: `make test-e2e-web` (21 Playwright tests). Updated stale e2e assertions to match command-picker trigger semantics and the current network channel header.
+- Final pre-commit `make verify` passed and was captured at `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/evidence/final-make-verify.log`.
+- QA execution artifacts were normalized for the strict auditor, auditor passed with exit code 0, and final artifacts were copied into `.compozy/tasks/agent-categories/qa/`.
+- The isolated QA daemon was stopped. The provider-backed session stop command raced after daemon stop and failed to connect to the socket, but daemon shutdown reported `Active Sessions: 0` and the foreground daemon session exited cleanly.
+- Commit `25e6fd61` (`feat: agent categories`) was created and pushed to `origin/agent-categories`.
+- Post-commit `make verify` passed.
+- PR #113 was opened: `https://github.com/compozy/agh/pull/113`.
+- Requested CodeRabbit watch command was run for PR #113. It started background run `reviews-agent-categories-266f3f-round-000-20260506-202459-000000000`; GitHub shows CodeRabbit status pending and "review in progress".
+
+Now:
+
+- Final response and goal closure.
+
+Next:
+
+- None.
+
+Open questions (UNCONFIRMED if needed):
+
+- None.
+
+Working set (files/ids/commands):
+
+- TechSpec: `.compozy/tasks/agent-categories/_techspec.md`
+- Template: `.agents/skills/cy-create-techspec/references/techspec-template.md`
+- Prior plan: `.codex/plans/2026-05-06-agent-category-path.md`
+- Ledger: `.codex/ledger/2026-05-06-MEMORY-agent-category-path.md`
+- Opus hardening prompt: `.compozy/tasks/agent-categories/opus-techspec-hardening-prompt.md`
+- Opus UI prompt: `.compozy/tasks/agent-categories/opus-ui-implementation-prompt.md`
+- Targeted Go verification: `go test ./internal/config ./internal/workspace ./internal/api/core ./internal/cli ./internal/testutil/e2e -count=1`
+- Web verification: `make web-lint`; `make web-typecheck`; `make web-test`
+- Full verification: `make verify`
+- Peer review round 1: `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/impl-review-result-round1.json`
+- Peer review final: `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/impl-review-final-round4.pretty.json`
+- QA manifest: `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/bootstrap-manifest.json`
+- QA daemon session id: `42578`
+- Provider-backed QA session id: `sess-78095017870b2ac0`
+- QA audit: `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/qa-audit-report.json`
+- Committed QA artifact root: `.compozy/tasks/agent-categories/qa/`
+- Commit: `25e6fd61`
+- PR: `#113`
+- CodeRabbit watch run: `reviews-agent-categories-266f3f-round-000-20260506-202459-000000000`