diff --git a/.agents/skills/grill-me/SKILL.md b/.agents/skills/grill-me/SKILL.md new file mode 100644 index 000000000..bd04394c6 --- /dev/null +++ b/.agents/skills/grill-me/SKILL.md @@ -0,0 +1,10 @@ +--- +name: grill-me +description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me". +--- + +Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. + +Ask the questions one at a time. + +If a question can be answered by exploring the codebase, explore the codebase instead. diff --git a/.agents/skills/grill-with-docs/ADR-FORMAT.md b/.agents/skills/grill-with-docs/ADR-FORMAT.md new file mode 100644 index 000000000..da7e78ec1 --- /dev/null +++ b/.agents/skills/grill-with-docs/ADR-FORMAT.md @@ -0,0 +1,47 @@ +# ADR Format + +ADRs live in `docs/adr/` and use sequential numbering: `0001-slug.md`, `0002-slug.md`, etc. + +Create the `docs/adr/` directory lazily — only when the first ADR is needed. + +## Template + +```md +# {Short title of the decision} + +{1-3 sentences: what's the context, what did we decide, and why.} +``` + +That's it. An ADR can be a single paragraph. The value is in recording *that* a decision was made and *why* — not in filling out sections. + +## Optional sections + +Only include these when they add genuine value. Most ADRs won't need them. + +- **Status** frontmatter (`proposed | accepted | deprecated | superseded by ADR-NNNN`) — useful when decisions are revisited +- **Considered Options** — only when the rejected alternatives are worth remembering +- **Consequences** — only when non-obvious downstream effects need to be called out + +## Numbering + +Scan `docs/adr/` for the highest existing number and increment by one. + +## When to offer an ADR + +All three of these must be true: + +1. **Hard to reverse** — the cost of changing your mind later is meaningful +2. **Surprising without context** — a future reader will look at the code and wonder "why on earth did they do it this way?" +3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons + +If a decision is easy to reverse, skip it — you'll just reverse it. If it's not surprising, nobody will wonder why. If there was no real alternative, there's nothing to record beyond "we did the obvious thing." + +### What qualifies + +- **Architectural shape.** "We're using a monorepo." "The write model is event-sourced, the read model is projected into Postgres." +- **Integration patterns between contexts.** "Ordering and Billing communicate via domain events, not synchronous HTTP." +- **Technology choices that carry lock-in.** Database, message bus, auth provider, deployment target. Not every library — just the ones that would take a quarter to swap out. +- **Boundary and scope decisions.** "Customer data is owned by the Customer context; other contexts reference it by ID only." The explicit no-s are as valuable as the yes-s. +- **Deliberate deviations from the obvious path.** "We're using manual SQL instead of an ORM because X." Anything where a reasonable reader would assume the opposite. These stop the next engineer from "fixing" something that was deliberate. +- **Constraints not visible in the code.** "We can't use AWS because of compliance requirements." "Response times must be under 200ms because of the partner API contract." +- **Rejected alternatives when the rejection is non-obvious.** If you considered GraphQL and picked REST for subtle reasons, record it — otherwise someone will suggest GraphQL again in six months. diff --git a/.agents/skills/grill-with-docs/CONTEXT-FORMAT.md b/.agents/skills/grill-with-docs/CONTEXT-FORMAT.md new file mode 100644 index 000000000..ddfa247ca --- /dev/null +++ b/.agents/skills/grill-with-docs/CONTEXT-FORMAT.md @@ -0,0 +1,77 @@ +# CONTEXT.md Format + +## Structure + +```md +# {Context Name} + +{One or two sentence description of what this context is and why it exists.} + +## Language + +**Order**: +{A concise description of the term} +_Avoid_: Purchase, transaction + +**Invoice**: +A request for payment sent to a customer after delivery. +_Avoid_: Bill, payment request + +**Customer**: +A person or organization that places orders. +_Avoid_: Client, buyer, account + +## Relationships + +- An **Order** produces one or more **Invoices** +- An **Invoice** belongs to exactly one **Customer** + +## Example dialogue + +> **Dev:** "When a **Customer** places an **Order**, do we create the **Invoice** immediately?" +> **Domain expert:** "No — an **Invoice** is only generated once a **Fulfillment** is confirmed." + +## Flagged ambiguities + +- "account" was used to mean both **Customer** and **User** — resolved: these are distinct concepts. +``` + +## Rules + +- **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others as aliases to avoid. +- **Flag conflicts explicitly.** If a term is used ambiguously, call it out in "Flagged ambiguities" with a clear resolution. +- **Keep definitions tight.** One sentence max. Define what it IS, not what it does. +- **Show relationships.** Use bold term names and express cardinality where obvious. +- **Only include terms specific to this project's context.** General programming concepts (timeouts, error types, utility patterns) don't belong even if the project uses them extensively. Before adding a term, ask: is this a concept unique to this context, or a general programming concept? Only the former belongs. +- **Group terms under subheadings** when natural clusters emerge. If all terms belong to a single cohesive area, a flat list is fine. +- **Write an example dialogue.** A conversation between a dev and a domain expert that demonstrates how the terms interact naturally and clarifies boundaries between related concepts. + +## Single vs multi-context repos + +**Single context (most repos):** One `CONTEXT.md` at the repo root. + +**Multiple contexts:** A `CONTEXT-MAP.md` at the repo root lists the contexts, where they live, and how they relate to each other: + +```md +# Context Map + +## Contexts + +- [Ordering](./src/ordering/CONTEXT.md) — receives and tracks customer orders +- [Billing](./src/billing/CONTEXT.md) — generates invoices and processes payments +- [Fulfillment](./src/fulfillment/CONTEXT.md) — manages warehouse picking and shipping + +## Relationships + +- **Ordering → Fulfillment**: Ordering emits `OrderPlaced` events; Fulfillment consumes them to start picking +- **Fulfillment → Billing**: Fulfillment emits `ShipmentDispatched` events; Billing consumes them to generate invoices +- **Ordering ↔ Billing**: Shared types for `CustomerId` and `Money` +``` + +The skill infers which structure applies: + +- If `CONTEXT-MAP.md` exists, read it to find contexts +- If only a root `CONTEXT.md` exists, single context +- If neither exists, create a root `CONTEXT.md` lazily when the first term is resolved + +When multiple contexts exist, infer which one the current topic relates to. If unclear, ask. diff --git a/.agents/skills/grill-with-docs/SKILL.md b/.agents/skills/grill-with-docs/SKILL.md new file mode 100644 index 000000000..6dad6ad7a --- /dev/null +++ b/.agents/skills/grill-with-docs/SKILL.md @@ -0,0 +1,88 @@ +--- +name: grill-with-docs +description: Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions. +--- + + + +Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. + +Ask the questions one at a time, waiting for feedback on each question before continuing. + +If a question can be answered by exploring the codebase, explore the codebase instead. + + + + + +## Domain awareness + +During codebase exploration, also look for existing documentation: + +### File structure + +Most repos have a single context: + +``` +/ +├── CONTEXT.md +├── docs/ +│ └── adr/ +│ ├── 0001-event-sourced-orders.md +│ └── 0002-postgres-for-write-model.md +└── src/ +``` + +If a `CONTEXT-MAP.md` exists at the root, the repo has multiple contexts. The map points to where each one lives: + +``` +/ +├── CONTEXT-MAP.md +├── docs/ +│ └── adr/ ← system-wide decisions +├── src/ +│ ├── ordering/ +│ │ ├── CONTEXT.md +│ │ └── docs/adr/ ← context-specific decisions +│ └── billing/ +│ ├── CONTEXT.md +│ └── docs/adr/ +``` + +Create files lazily — only when you have something to write. If no `CONTEXT.md` exists, create one when the first term is resolved. If no `docs/adr/` exists, create it when the first ADR is needed. + +## During the session + +### Challenge against the glossary + +When the user uses a term that conflicts with the existing language in `CONTEXT.md`, call it out immediately. "Your glossary defines 'cancellation' as X, but you seem to mean Y — which is it?" + +### Sharpen fuzzy language + +When the user uses vague or overloaded terms, propose a precise canonical term. "You're saying 'account' — do you mean the Customer or the User? Those are different things." + +### Discuss concrete scenarios + +When domain relationships are being discussed, stress-test them with specific scenarios. Invent scenarios that probe edge cases and force the user to be precise about the boundaries between concepts. + +### Cross-reference with code + +When the user states how something works, check whether the code agrees. If you find a contradiction, surface it: "Your code cancels entire Orders, but you just said partial cancellation is possible — which is right?" + +### Update CONTEXT.md inline + +When a term is resolved, update `CONTEXT.md` right there. Don't batch these up — capture them as they happen. Use the format in [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md). + +Don't couple `CONTEXT.md` to implementation details. Only include terms that are meaningful to domain experts. + +### Offer ADRs sparingly + +Only offer to create an ADR when all three are true: + +1. **Hard to reverse** — the cost of changing your mind later is meaningful +2. **Surprising without context** — a future reader will wonder "why did they do it this way?" +3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons + +If any of the three is missing, skip the ADR. Use the format in [ADR-FORMAT.md](./ADR-FORMAT.md). + + diff --git a/.codex/ledger/2026-05-06-MEMORY-agent-category-path.md b/.codex/ledger/2026-05-06-MEMORY-agent-category-path.md new file mode 100644 index 000000000..b7ae5a166 --- /dev/null +++ b/.codex/ledger/2026-05-06-MEMORY-agent-category-path.md @@ -0,0 +1,100 @@ +Goal (incl. success criteria): + +- Implement `.compozy/tasks/agent-categories/_techspec.md` end to end for AGENT.md agent categories. +- Success requires: TechSpec reformatted to the official template with stronger tests, backend/contract/docs implemented, web/UI implemented through Compozy + Claude Opus, peer review loop completed, QA report generated by Opus, QA execution completed locally, PR opened as `feat: agent categories`, and CodeRabbit review watch started. + +Constraints/Assumptions: + +- Artifacts/docs/code in English; conversation in BR-PT. +- User explicitly requested Compozy/Claude Opus before implementation to improve test scenarios and reformat `_techspec.md`. +- User explicitly requested Compozy/Claude Opus for web/UI implementation. +- `category_path` is display-only metadata; no runtime behavior changes unless TechSpec review changes this. +- Use `packages/ui/src/components/reui/tree.tsx` and `packages/ui/src/components/command.tsx` for web UI. +- Pre-existing dirty worktree includes site landing files, packages/ui/web package files, and untracked UI components; do not revert user changes. +- Must use `rtk` for shell commands. No destructive git commands. + +Key decisions: + +- Canonical field from accepted direction: `category_path: ["Marketing", "Sales"]`. +- No aliases, slash-string format, multi-tagging, backend tree endpoint, DB migration, or `config.toml` key unless TechSpec hardening surfaces a root-cause need. +- Agents with missing/empty `category_path` remain root-level UI items. +- Add `@headless-tree/react` via package manager if needed, matching existing `@headless-tree/core`. + +State: + +- CODEX_LOOP active; implementation is following the hardened `.compozy/tasks/agent-categories/_techspec.md`. +- Opus implementation peer-review loop is complete. Local QA execution is in progress against an isolated lab. + +Done: + +- Goal registered with `functions.create_goal`. +- Read RTK, current ledger, and explicit skills: `compozy`, `no-workarounds`, `cy-impl-peer-review`, `qa-report`, `qa-execution`, `codex-loop`. +- Asked Claude Code Opus via `compozy exec` to reformat/harden `.compozy/tasks/agent-categories/_techspec.md`. +- Opus rewrote the TechSpec in the full template and expanded Go/Web/CLI/API/bundle/native/QA test scenarios. +- Implemented backend/config/API/CLI/docs-source propagation for `AgentDef.CategoryPath`. +- Ran `make codegen`, `make codegen-check`, and targeted Go tests for config/workspace/api/core/cli/e2e helpers. +- Ran `make cli-docs`; reverted generator-only CLI reference formatting noise because no CLI help source changed. +- Delegated web/UI implementation to Claude Code Opus via Compozy. +- Removed out-of-scope Opus edits to root/web/site instruction files and the untracked `impeccable` skill directory. +- Reviewed Opus UI implementation and fixed the loading-to-loaded tree expansion lifecycle plus duplicated folder chevrons. +- Web checks passed: focused agent UI tests, `make web-lint`, `make web-typecheck`, and `make web-test`. +- Fixed `make verify` Go lint failures (`funlen` in `internal/config/agent_edit.go`, unused `cloneMCPServer` in `internal/extension/manager.go`). +- Full `make verify` passed after the lint fixes. +- Opus peer-review round 1 artifact directory moved to `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/` because `.peer-reviews/*.json` JSONL artifacts are picked up by `oxfmt`. +- Locally fixed Go nits from peer review: direct category validation, normalized edit round-trip, `agentCategoryLabel` coverage/comment. +- Opus UI remediation addressed case-sensitive category folders, tree aria typing/leaf expansion, sidebar/selector stories, sidebar integration test, route-change expansion test, and Playwright coverage. +- Post-remediation checks passed: `make web-lint`, `make web-typecheck`, `make web-test` (243 files / 1809 tests), `go test ./internal/config ./internal/cli -count=1`, and full `make verify`. +- Opus peer-review round 2 returned `SHIP` with no blockers. Locally fixed the actionable risks/nits: settings trigger test ID, Tree package story/test, exact `@headless-tree` pins, Go clone/normalize comments, category folder invariant, network dialog full-selection callback, route mock fidelity, and workspace human output assertion. +- Verification after round 2 fixes passed: targeted Vitest (4 files / 34 tests), `make bun-typecheck`, `make bun-test` (372 files / 2397 tests), `go test ./internal/config ./internal/cli -count=1`, and full `make verify`. +- Opus peer-review round 3 returned `SHIP` with 0 blockers / 0 risks / 3 nits. Fixed all 3 nits: TOON shape comment, removed extension clone wrapper, and made Tree optional-feature test cover no selection feature. +- Opus peer-review round 4 returned `SHIP` with 0 blockers / 0 risks / 0 nits. Full `make verify` passed before the round. +- Opus QA report artifacts were generated under `.compozy/tasks/agent-categories/qa/`. +- Isolated QA lab was created under `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/` with `AGH_HOME=/var/folders/7x/xg204hnd04b81fczcxvjlhzr0000gn/T/aghqa-6d7ced711656/runtime` and daemon port `56440`. +- QA execution found and fixed a real web route bug: agent detail fetches now include the active workspace, matching the sidebar's workspace-scoped agent list. +- QA evidence collected so far covers CLI/API/UDS/native agent listing/detail, invalid segment diagnostics, daemon restart persistence, provider-backed session prompt, and browser sidebar/selector behavior. +- Focused web checks after the route fix passed: `cd web && bun run test:raw src/systems/agent/adapters/agent-api.test.ts src/hooks/routes/use-agent-detail-page.test.tsx`, `make web-lint`, `make web-typecheck`, and `cd web && bun run test:e2e:daemon-served:raw agent-categories.spec.ts`. +- `make test-e2e-runtime` initially failed to compile `internal/daemon/daemon_network_collaboration_integration_test.go` because a `networkAuditExpectation` literal still passed string values to `*string` fields. The test now uses `auditFieldValue` for `Surface` and `ThreadID`; the runtime e2e gate is rerunning. +- The rerun surfaced a deterministic `TestDaemonE2EACPmockPermissionDisconnectProjectsRuntimeFailure` timeout. Root cause was the acpmock fault fixture sequencing a synchronous permission request before `driver_control.disconnect`, so the disconnect step was unreachable while permission was pending. Fixed the fixture by scheduling an async delayed disconnect before the permission step. +- Runtime e2e verification now passes: `make test-e2e-runtime` (daemon, httpapi, udsapi, testutil/e2e lanes). +- Browser-side e2e verification now passes: `make test-e2e-web` (21 Playwright tests). Updated stale e2e assertions to match command-picker trigger semantics and the current network channel header. +- Final pre-commit `make verify` passed and was captured at `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/evidence/final-make-verify.log`. +- QA execution artifacts were normalized for the strict auditor, auditor passed with exit code 0, and final artifacts were copied into `.compozy/tasks/agent-categories/qa/`. +- The isolated QA daemon was stopped. The provider-backed session stop command raced after daemon stop and failed to connect to the socket, but daemon shutdown reported `Active Sessions: 0` and the foreground daemon session exited cleanly. +- Commit `25e6fd61` (`feat: agent categories`) was created and pushed to `origin/agent-categories`. +- Post-commit `make verify` passed. +- PR #113 was opened: `https://github.com/compozy/agh/pull/113`. +- Requested CodeRabbit watch command was run for PR #113. It started background run `reviews-agent-categories-266f3f-round-000-20260506-202459-000000000`; GitHub shows CodeRabbit status pending and "review in progress". + +Now: + +- Final response and goal closure. + +Next: + +- None. + +Open questions (UNCONFIRMED if needed): + +- None. + +Working set (files/ids/commands): + +- TechSpec: `.compozy/tasks/agent-categories/_techspec.md` +- Template: `.agents/skills/cy-create-techspec/references/techspec-template.md` +- Prior plan: `.codex/plans/2026-05-06-agent-category-path.md` +- Ledger: `.codex/ledger/2026-05-06-MEMORY-agent-category-path.md` +- Opus hardening prompt: `.compozy/tasks/agent-categories/opus-techspec-hardening-prompt.md` +- Opus UI prompt: `.compozy/tasks/agent-categories/opus-ui-implementation-prompt.md` +- Targeted Go verification: `go test ./internal/config ./internal/workspace ./internal/api/core ./internal/cli ./internal/testutil/e2e -count=1` +- Web verification: `make web-lint`; `make web-typecheck`; `make web-test` +- Full verification: `make verify` +- Peer review round 1: `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/impl-review-result-round1.json` +- Peer review final: `.tmp/agent-categories-peer-reviews/20260506T180501Z-agent-categories/impl-review-final-round4.pretty.json` +- QA manifest: `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/bootstrap-manifest.json` +- QA daemon session id: `42578` +- Provider-backed QA session id: `sess-78095017870b2ac0` +- QA audit: `.tmp/qa-labs/agh-agent-categories-20260506-193527-733386-lab/qa-artifacts/qa/qa-audit-report.json` +- Committed QA artifact root: `.compozy/tasks/agent-categories/qa/` +- Commit: `25e6fd61` +- PR: `#113` +- CodeRabbit watch run: `reviews-agent-categories-266f3f-round-000-20260506-202459-000000000` diff --git a/.compozy/tasks/agent-categories/_techspec.md b/.compozy/tasks/agent-categories/_techspec.md new file mode 100644 index 000000000..4d7e3ad84 --- /dev/null +++ b/.compozy/tasks/agent-categories/_techspec.md @@ -0,0 +1,517 @@ +# TechSpec — AGENT.md Category Path With Tree Sidebar And Command Agent Picker + +## Executive Summary + +Adds an optional, display-only `category_path: ["Marketing", "Sales"]` array to `AGENT.md` frontmatter so agents can be organized hierarchically without affecting runtime behavior, ACP execution, scheduling, autonomy, or permissions. The field flows verbatim through the existing parse → validate → resource codec → daemon sync → contract → CLI/HTTP/UDS → web UI pipeline as a flat string array on every agent payload; the web client builds the tree/group presentation purely client-side. The web UI swaps the flat sidebar agent list for a `@headless-tree/react`-backed `AgentCategoryTree` built on `packages/ui/src/components/reui/tree.tsx`, and replaces every native agent `` controls in the session-create dialog, the settings skills agent scope picker, and the network create-channel dialog. + - A private shared `AgentCommandList` renders `Command`/`CommandInput`/`CommandList`/`CommandGroup`/`CommandItem`/`CommandEmpty` so the two consumer components share keyboard, grouping, and empty-state behavior. + - A new `agent-category` library inside the agent system owns category utilities (build tree nodes, sort, ID derivation, label formatting). + +### Data Flow + +``` +AGENT.md (frontmatter) + → ParseAgentDef → AgentDef.CategoryPath (normalized) + → AgentDef.Validate (rejects invalid segments) + → workspace resolver / resource codec / daemon project + → contract.AgentPayload.CategoryPath (flat string[]) + ├── HTTP /api/agents and /api/workspaces/:id (web client builds tree) + ├── UDS agh agent list / info / workspace describe + └── Native tool agh__workspace_describe +``` + +The web tree/group is purely presentational: same flat payload, two distinct UI projections (sidebar tree + command-picker grouped list). + +## Implementation Design + +### Core Interfaces + +#### `internal/config/agent.go` — canonical field and normalization + +```go +type AgentDef struct { + // ... existing fields ... + CategoryPath []string `yaml:"category_path,omitempty" toml:"category_path,omitempty" json:"category_path,omitempty"` + // ... existing fields ... +} + +type parsedAgentDef struct { + // ... existing fields ... + CategoryPath []string `yaml:"category_path,omitempty" toml:"category_path,omitempty"` + // ... existing fields ... +} + +// ParseAgentDef: +agent.CategoryPath = normalizeAgentCategoryPath(parsed.CategoryPath) +``` + +```go +// normalizeAgentCategoryPath trims each segment and returns nil for an empty result. +// It does NOT lowercase, reorder, or deduplicate; casing and order are author intent. +func normalizeAgentCategoryPath(in []string) []string { + if len(in) == 0 { + return nil + } + out := make([]string, 0, len(in)) + for _, raw := range in { + out = append(out, strings.TrimSpace(raw)) + } + return out +} +``` + +```go +// validateAgentCategoryPath rejects segments the file system / UI cannot safely render. +// Called from AgentDef.Validate after normalization. +func validateAgentCategoryPath(path []string) error { + for i, seg := range path { + switch { + case seg == "": + return fmt.Errorf("agent.category_path[%d]: blank segment", i) + case seg == "." || seg == "..": + return fmt.Errorf("agent.category_path[%d]: %q is not a valid segment", i, seg) + case strings.ContainsAny(seg, `/\`): + return fmt.Errorf("agent.category_path[%d]: %q must not contain '/' or '\\'", i, seg) + } + } + return nil +} +``` + +`AgentDef.Validate` calls `validateAgentCategoryPath(a.CategoryPath)` after the existing tool/permission checks. `validateAgentResourceSpec` and `EditAgentDefFile` route through `normalizeAgentCategoryPath` then `Validate`, so the same rules apply uniformly. + +#### `internal/config/agent_clone.go` — single clone authority + +```go +func CloneAgentDef(agent AgentDef) AgentDef { + return AgentDef{ + // ... existing fields ... + Skills: normalizeAgentSkillsConfig(agent.Skills), + CategoryPath: append([]string(nil), agent.CategoryPath...), + // ... existing fields ... + } +} +``` + +#### `internal/workspace/clone.go` — delete the hand-rolled clone + +Replace `cloneAgentDefs` body with a single call: + +```go +func cloneAgentDefs(src []aghconfig.AgentDef) []aghconfig.AgentDef { + if len(src) == 0 { + return nil + } + cloned := make([]aghconfig.AgentDef, 0, len(src)) + for _, agent := range src { + cloned = append(cloned, aghconfig.CloneAgentDef(agent)) + } + return cloned +} +``` + +This deletes the hand-rolled field-by-field copy that already silently dropped `Skills`. Treat it as a delete target (see Key Decisions). + +#### `internal/api/contract/contract.go` and `bundles.go` + +```go +type AgentPayload struct { + // ... existing fields ... + CategoryPath []string `json:"category_path,omitempty"` + // ... existing fields ... +} + +type BundleAgentPayload struct { + // ... existing fields ... + CategoryPath []string `json:"category_path,omitempty"` + // ... existing fields ... +} +``` + +#### `internal/api/core/conversions.go` + +```go +func AgentPayloadFromDef(agent aghconfig.AgentDef) contract.AgentPayload { + // ... existing field copies ... + return contract.AgentPayload{ + // ... existing fields ... + CategoryPath: append([]string(nil), agent.CategoryPath...), + // ... existing fields ... + } +} +``` + +`AgentPayloadFromDiagnostic` does NOT set `CategoryPath` — diagnostic placeholder rows leave the field nil because the source AGENT.md is malformed and the parsed value cannot be trusted. + +#### Web — agent category utility (new `web/src/systems/agents/lib/agent-category.ts`) + +```ts +export type AgentCategoryNode = + | { kind: "folder"; id: string; label: string; segments: string[]; children: AgentCategoryNode[] } + | { kind: "leaf"; id: string; label: string; agent: AgentPayload }; + +// buildAgentCategoryTree: groups agents by AgentPayload.category_path. +// Folders are sorted before leaves; siblings are sorted case-insensitively by visible label. +// Folder IDs derive from the joined path ("category:Marketing/Sales"); leaf IDs derive from agent name ("agent:coder"). +// Agents with no category_path become root-level leaves (no synthetic "Uncategorized" folder). +export function buildAgentCategoryTree(agents: AgentPayload[]): AgentCategoryNode[]; + +// formatCategoryLabel(["Marketing", "Sales"]) === "Marketing / Sales" +export function formatCategoryLabel(path: string[] | null | undefined): string; +``` + +#### Web — `AgentCategoryTree` (new `web/src/components/agent-category-tree.tsx`) + +Wraps `useTree` from `@headless-tree/react` with `syncDataLoaderFeature` + `selectionFeature`. Renders folder nodes with `TreeItem`/`TreeItemLabel`, and leaf nodes via the `TreeItem` render hook so each leaf is a TanStack `Link` to `/agents/$name`. Preserves existing test IDs (`agent-row-${agent.name}`, `agent-active-${agent.name}`, `agent-status-dot-${agent.name}`) and adds new deterministic test IDs for folders (`agent-category-${joinedPath}`). + +Default expansion: ancestors of the active agent expand on first render; otherwise top-level categories are expanded. Tree expansion state is local UI state only (no persistence, no config key). + +#### Web — `AgentCommandSelect` / `AgentCommandMultiSelect` + +Both compose a private `AgentCommandList` that renders `Command`, `CommandInput`, `CommandList`, `CommandGroup` (heading = formatted category label, root-level agents under `"Agents"`), `CommandItem`, `CommandEmpty`, and `CommandShortcut`. They differ only in selection semantics: + +```ts +interface AgentCommandSelectProps { + agents: AgentPayload[]; + value: string | null; + onChange: (next: string | null) => void; + triggerTestId?: string; // reuses existing testIds (e.g. "session-agent-select") + disabled?: boolean; + placeholder?: string; +} + +interface AgentCommandMultiSelectProps { + agents: AgentPayload[]; + value: string[]; + onToggle: (next: string[]) => void; + triggerTestId?: string; // reuses "settings-agent-select" etc. +} +``` + +Single closes its popover on selection and shows the selected agent's name + provider + formatted category label inside the trigger. Multi keeps the popover open, marks each item with `data-checked={selected}`, and surfaces a selected count + per-item provider/category metadata. + +### Data Models + +| Surface | Type | Field | Shape | +|---|---|---|---| +| `internal/config.AgentDef` | Go struct | `CategoryPath` | `[]string` (yaml/toml/json `category_path,omitempty`) | +| `internal/config.parsedAgentDef` | Go struct | `CategoryPath` | `[]string` | +| `contract.AgentPayload` | API contract | `CategoryPath` | `[]string` (`json:"category_path,omitempty"`) | +| `contract.BundleAgentPayload` | API contract | `CategoryPath` | `[]string` (`json:"category_path,omitempty"`) | +| Generated `web/src/generated/agh-openapi.d.ts` | TS | `category_path?: string[]` | array, optional | +| Web `AgentCategoryNode` | Discriminated union | `kind` | `"folder" \| "leaf"` | + +No new database columns, indexes, tables, or migrations are introduced. AGH SQLite (`agh.db`, `events.db`, catalog DBs) is untouched. + +### API Endpoints + +No new endpoints. The following existing endpoints gain `category_path?: string[]` on their agent payloads: + +| Method | Path | Source | Notes | +|---|---|---|---| +| GET | `/api/agents` | `core.HandlerListAgents` | `AgentResponse.agents[].category_path` | +| GET | `/api/agents/:name` | `core.HandlerGetAgent` | `AgentResponse.agent.category_path` | +| GET | `/api/workspaces/:id` | workspace detail handler | Each `agents[].category_path` | +| GET | `/api/bundles/.../activations` | bundle activation projector | Each `agents[].category_path` | +| (UDS) | `agent.list`, `agent.info`, `workspace.describe` | UDS handlers | Same payload over UDS | +| (Native tool) | `agh__workspace_describe` | `daemon/native_tools.go` | Inherits the contract | + +OpenAPI regeneration via `make codegen` propagates the field into `openapi/agh.json` and `web/src/generated/agh-openapi.d.ts`. `make codegen-check` MUST pass. + +## Integration Points + +No external services. The change extends two third-party UI primitives already in the repo (`@headless-tree/core`, `cmdk`) by adding the missing peer: + +- **`@headless-tree/react@^1.6.3`** — added to `agh-web` via `bun add @headless-tree/react@^1.6.3 --filter agh-web` to match the existing `@headless-tree/core@^1.6.3`. No version drift between core and react peers is allowed. +- **No new icon, motion, or animation packages.** `motion` and `lucide-react` already exist in `agh-web`; the tree and command components rely on simple CSS transitions only. + +## Impact Analysis + +### Code Surfaces + +| Component | Impact Type | Description and Risk | Required Action | +|-----------|-------------|----------------------|-----------------| +| `internal/config/agent.go` | modified | New `CategoryPath` field in `AgentDef` + `parsedAgentDef`, normalization helper, validation rule. Risk: medium — every parse path runs through this. | Add normalization + validation + tests for valid/invalid segments and root-level nil. | +| `internal/config/agent_edit.go` | modified | `EditAgentDefFile` must round-trip `CategoryPath` through `parsedAgentDef` so on-disk frontmatter survives skill enable/disable mutations. Risk: medium — write path silently drops fields when forgotten. | Mirror parsed → agent → parsed copy for `CategoryPath`; add a test that toggles `Skills.Disabled` and confirms `category_path` is preserved on disk. | +| `internal/config/agent_clone.go` | modified | `CloneAgentDef` is the single deep-copy authority for any consumer that needs to mutate. | Add `CategoryPath` to clone with defensive copy. | +| `internal/config/agent_resource.go` | modified | `validateAgentResourceSpec` normalizes and re-validates spec before storage. | Apply same normalization + validation. | +| `internal/workspace/clone.go` | **delete + replace** | Hand-rolled `cloneAgentDefs` already drops `Skills`; rewriting it would also need to track every new `AgentDef` field forever. | **Delete** the field-by-field clone body and call `aghconfig.CloneAgentDef`. (Listed as a delete target.) | +| `internal/api/contract/{contract,bundles}.go` | modified | Adds `CategoryPath` to `AgentPayload` and `BundleAgentPayload`. | Mark `omitempty`; never inline JSON aliases. | +| `internal/api/core/conversions.go` | modified | `AgentPayloadFromDef` copies the field; `AgentPayloadFromDiagnostic` keeps it nil. | Defensive copy; document the diagnostic exclusion in a comment. | +| `internal/daemon/*` (resource sync, bundle activation) | modified | Bundle materialization/projection and resource sync inherit the field. | Verified via transport-parity + bundle activation tests. | +| `internal/cli/agent_commands.go` | modified | Human, toon, and JSON output for `agent list`, `agent info`, and workspace agent views show category. | Add a Category column to human/toon output; JSON exposes the array verbatim. | +| `internal/testutil/e2e/config_seed.go` | modified | Adds optional `CategoryPath` to `AgentSeed` for fixtures. | Backward-compatible default = nil; any e2e fixture can now seed a categorized agent. | +| `web/src/generated/agh-openapi.d.ts` | regenerated | OpenAPI codegen output. | `make codegen` then `make codegen-check`. | +| `web/e2e/fixtures/runtime-seed.ts` | modified | `BrowserMockAgentSeed` gets optional `category_path: string[]`. | Mock builder writes `category_path` into the served payload. | +| `packages/ui/src/index.ts` | modified | Re-exports `Tree`, `TreeItem`, `TreeItemLabel`, `TreeDragLine`. | Public surface for `agh-web`. | +| `agh-web` `app-sidebar.tsx` | refactored | Replace `AgentList` with `AgentCategoryTree`. Risk: high (test ID & route preservation). | Preserve existing test IDs and active route handling. Delete `AgentList`. | +| `agh-web` session-create dialog | refactored | Replace native `` with `AgentCommandSelect`. | Preserve `settings-agent-select` test ID. | +| `agh-web` network create-channel dialog | refactored | Replace custom button list with `AgentCommandMultiSelect`. | Preserve `network-agent-option-${agent.name}` on each item. | + +### Extensibility, Agent-Manageability, and Config Lifecycle + +- **Extensibility surfaces (per CLAUDE.md "extensible by the runtime").** No new extension hook or registry is required. The field is part of the `AgentDef` resource shape so it automatically flows through: + - Resource codec (`NewAgentResourceCodec`) — extensions that author agent resources via the resource API may set the field; validation runs the same `validateAgentCategoryPath` rule. + - Bundle materialization/projection — `BundleAgentPayload.category_path` makes the field part of the bundle activation contract, so external bundles can declare categories without touching AGH source. + - Skill/agent registry — unchanged. Categories never affect skill resolution, precedence, or agent lookup. + - Bridge SDK / extension manifest — unchanged. Categories are display metadata, not capability. +- **Agent-manageability (per CLAUDE.md "managed by agents").** Agents must be able to discover and reason about categories without the web UI: + - CLI `agh agent list -o json` and `agh agent info -o json` expose `category_path` verbatim. + - HTTP/UDS parity: same `AgentPayload.category_path` over both transports. + - Native tool `agh__workspace_describe` returns the field through the existing `AgentPayload` projection so onboard agents can introspect categories. + - No agent-only mutation API is added — categories are authored by humans editing `AGENT.md`. `agh agent edit` (via `EditAgentDefFile`) round-trips the field on disk, so any agent that already has write access to AGENT.md can update it. +- **Config lifecycle (per CLAUDE.md "config.toml keys/defaults/docs").** **No `config.toml` key is added.** Justification: `category_path` is per-agent metadata stored in the agent's own `AGENT.md`, not a runtime/global behavior toggle. There is nothing to enable, disable, default, or version. Adding a config key would invent a global flag for an opt-in field that is already per-agent. Sidebar tree expansion is intentionally local UI state; no persisted preference. + +### Web/Docs Impact + +- **Web (`web/`):** + - `src/components/app-sidebar.tsx` — `AgentList` deleted, `AgentCategoryTree` added. + - `src/components/app-sidebar.test.tsx` — rewritten to assert tree behavior, active row, active session dot, default expansion, keyboard navigation, and preserved test IDs. + - `src/components/stories/app-sidebar.stories.tsx` — adds a story with categorized agents so Storybook visibly demonstrates the tree. + - `src/systems/agents/lib/agent-category.ts` (new) + `agent-category.test.ts` (new). + - `src/systems/agents/components/agent-command-select.tsx` (new) + tests. + - `src/systems/agents/components/agent-command-multi-select.tsx` (new) + tests. + - `src/systems/agents/components/agent-command-list.tsx` (new, private) + colocated tests of the shared list. + - `src/systems/sessions/...session-create-dialog.tsx` — switches to `AgentCommandSelect`. + - `src/routes/.../settings*` — settings skills agent scope switches to `AgentCommandSelect`. + - `src/systems/network/components/network-create-channel-dialog.tsx` — switches to `AgentCommandMultiSelect`; preserves `network-agent-option-${agent.name}` test IDs. + - All four touched dialogs/views require updated tests that drive the picker via `userEvent.keyboard` rather than `selectOptions` (native `` elements (and their `selectOptions` test interactions) used for agent selection in: + - session-create dialog, + - settings skills agent scope picker, + - network create-channel dialog. +4. Any inline agent-grouping helpers in network channel dialog. Categories are now derived from `AgentPayload.category_path`. + +Delete targets explicitly NOT in scope (rejected aliases that must not be introduced or revived): + +- No `categories` alias (singular/plural) on AgentDef, parsedAgentDef, or any payload. +- No slash-string fallback (`"Marketing/Sales"`) accepted by parsing or validation. +- No `Uncategorized` synthetic folder in the web tree. +- No `category_path` config.toml key. +- No backend tree/group endpoint or denormalized category table. + +## Testing Approach + +### Unit Tests + +#### `internal/config` (Go) + +- `TestParseAgentDef_ShouldParseCategoryPath` — valid `category_path: [Marketing, Sales]` → `AgentDef.CategoryPath` equals `["Marketing", "Sales"]`, casing and order preserved. +- `TestParseAgentDef_ShouldReturnNilWhenCategoryPathMissing` — agents without the key parse with `CategoryPath == nil` (root-level). +- `TestParseAgentDef_ShouldReturnNilWhenCategoryPathEmptyArray` — `category_path: []` normalizes to nil (no synthetic folder). +- `TestParseAgentDef_ShouldTrimWhitespaceSegments` — `[" Marketing ", "Sales"]` normalizes to `["Marketing", "Sales"]`. +- `TestParseAgentDef_ShouldRejectBlankSegment` — `["Marketing", ""]` fails validation with a message naming `agent.category_path[1]`. +- `TestParseAgentDef_ShouldRejectWhitespaceOnlySegment` — `[" "]` fails validation with the blank-segment message. +- `TestParseAgentDef_ShouldRejectDotSegment` — `["."]` fails with a message naming the invalid segment. +- `TestParseAgentDef_ShouldRejectDotDotSegment` — `[".."]` fails with the invalid-segment message. +- `TestParseAgentDef_ShouldRejectForwardSlashInSegment` — `["Marketing/Sales"]` fails (no slash-string fallback). +- `TestParseAgentDef_ShouldRejectBackslashInSegment` — `["Marketing\\Sales"]` fails. +- `TestParseAgentDef_ShouldRejectNonArrayValue` — `category_path: "Marketing"` fails with a strict-yaml decode error (no scalar fallback). +- `TestParseAgentDef_ShouldRejectCategoriesAliasKey` — `categories: [Marketing]` fails with `ErrInvalidAgentFrontmatterKey` (or equivalent strict yaml unknown-key error). Confirms zero alias support. +- `TestEditAgentDefFile_ShouldPreserveCategoryPathOnSkillToggle` — load fixture with `category_path`, mutate `Skills.Disabled`, write, re-read; assert `category_path` is preserved verbatim in YAML. +- `TestCloneAgentDef_ShouldDeepCopyCategoryPath` — mutating the source slice after clone must not affect the clone. +- `TestValidateAgentResourceSpec_ShouldNormalizeAndValidateCategoryPath` — feeds a spec with whitespace + invalid segments; first round normalizes, second rejects with `errors.Is(err, resources.ErrValidation)`. +- `TestNormalizeAgentCategoryPath_ShouldReturnNilForEmptyInput` — explicit unit test for the helper. + +Each test uses `t.Run("Should ...")` subtests with `t.Parallel()` per `agh-test-conventions`. + +#### `internal/workspace` + +- `TestCloneAgentDefs_ShouldPreserveCategoryPath` — workspace clone round-trips the field. +- `TestCloneAgentDefs_ShouldPreserveSkills` — regression test for the pre-existing skills-clone gap, now fixed by delegating to `aghconfig.CloneAgentDef`. + +#### `internal/api/core` + `internal/api/contract` + +- `TestAgentPayloadFromDef_ShouldCopyCategoryPath` — extends the existing `TestAgentPayloadFromDef`. +- `TestAgentPayloadFromDef_ShouldDefensivelyCopyCategoryPath` — mutating the source after conversion must not leak into the payload. +- `TestAgentPayloadFromDiagnostic_ShouldOmitCategoryPath` — diagnostic placeholder rows leave the field nil. +- `TestAgentPayload_JSONShape_ShouldOmitNilCategoryPath` — encode/decode confirms `omitempty` + array shape. +- `TestBundleAgentPayload_ShouldRoundTripCategoryPath` — bundle activation payload preserves the field through encode/decode. + +#### Bundle materialization / daemon sync (`internal/bundles`, `internal/daemon`) + +- `TestBundleProjector_ShouldProjectCategoryPath` — bundle materialization/projection includes the field on each `BundleAgentPayload`. +- `TestBundleActivationPayload_ShouldExposeCategoryPathOnAgents` — verifies the activation-conversion seam. +- `TestDaemonResourceSync_ShouldStoreCategoryPath` — round-trips through resource validate + storage. + +#### CLI (`internal/cli`) + +- `TestAgentList_ShouldRenderCategoryColumn_Human` — human output contains `Marketing / Sales` for a categorized agent and an empty cell for a root-level agent. +- `TestAgentList_ShouldRenderCategoryColumn_Toon` — toon output includes a `category` key with the same formatting. +- `TestAgentList_ShouldExposeCategoryPath_JSON` — `-o json` output includes `category_path` as an array exactly as parsed. +- `TestAgentInfo_ShouldRenderCategoryPath_AllFormats` — same triple coverage for `agent info`. +- `TestWorkspaceAgents_ShouldRenderCategoryColumn` — workspace agent table view includes the column. + +#### Native/agent-manageable surfaces + +- `TestNativeWorkspaceDescribe_ShouldExposeCategoryPath` — `agh__workspace_describe` returns `category_path` on each agent. +- `TestToolsTransportParity_WorkspaceDescribe_ShouldIncludeCategoryPath` — extends the existing transport-parity test. + +#### Web utilities + components (Vitest + Testing Library) + +- **`agent-category.test.ts`:** + - `Should build a flat list when no agent has category_path`. + - `Should group agents by single-segment category_path`. + - `Should build nested folders for multi-segment paths`. + - `Should sort folders before leaves`. + - `Should sort siblings case-insensitively by visible label`. + - `Should derive deterministic folder IDs from joined segments`. + - `Should derive deterministic leaf IDs from agent names`. + - `Should render root-level leaves alongside top-level folders (no Uncategorized)`. + - `Should treat undefined and empty-array category_path as root-level`. +- **`AgentCategoryTree.test.tsx`:** + - `Should render category folders and agent leaves`. + - `Should render the active route with data-active=true`. + - `Should render the active-session dot for active agents`. + - `Should preserve agent-row-${name}, agent-active-${name}, agent-status-dot-${name} test IDs`. + - `Should expand ancestors of the active agent on initial render`. + - `Should expand top-level categories on initial render when no agent is active`. + - `Should support keyboard navigation between siblings, into folders, and onto leaves` (Arrow keys + Enter via `userEvent`). + - `Should render the loading state with agents-loading test ID`. + - `Should render the empty state with agents-empty test ID`. + - `Should render the error state with agents-empty test ID` (current behavior). +- **`AgentCommandSelect.test.tsx`:** + - `Should filter results via keyboard search` (`userEvent.type`). + - `Should group results by formatted category label`. + - `Should render root-level agents under "Agents" group`. + - `Should render an empty state when search yields no results`. + - `Should display selected agent name, provider, and category label in the trigger`. + - `Should call onChange with the agent name when an item is selected`. + - `Should close the popover on selection`. + - `Should preserve the existing trigger test IDs (session-agent-select, settings-agent-select)`. +- **`AgentCommandMultiSelect.test.tsx`:** + - `Should render data-checked on selected items`. + - `Should toggle items via onToggle`. + - `Should remain open after a selection`. + - `Should display the selected count`. + - `Should render provider/category metadata per item`. + - `Should preserve network-agent-option-${name} test IDs on items`. +- Updated **session-create-dialog.test.tsx**, **settings skills agent-scope test**, and **network-create-channel-dialog.test.tsx** must drive the picker via keyboard (no `selectOptions`). + +#### Fixtures, stories, mocks + +- `internal/testutil/e2e.AgentSeed` gets an optional `CategoryPath []string` and is used in at least one new e2e fixture exercising a categorized agent. +- `web/e2e/fixtures/runtime-seed.ts` `BrowserMockAgentSeed` gains `category_path?: string[]`; the mock builder writes it into the served `AgentPayload`. +- `app-sidebar.stories.tsx` adds a `Categorized` story with a multi-level `category_path` so the tree is visible in Storybook. +- `agent-command-select.stories.tsx` (new) demonstrates grouped, empty, and selected states. + +### Integration Tests + +- **Resource codec round-trip** (`internal/config/agent_resource_test.go` integration tag): write a categorized agent into the resource store, read it back via the resource API, assert structural equality and that validation errors surface as `errors.Join(resources.ErrValidation, ...)`. +- **Daemon validate/encode** (existing daemon test harness): a categorized agent flows through validation + encoding without diagnostics; an invalid category_path raises a structured error captured in diagnostics. +- **Bundle activation end-to-end** (existing bundle activation integration test): a bundle that ships a categorized agent yields the field on the activation payload over both HTTP and UDS. +- **`make test-e2e-runtime`** Go harness: a workspace seeded with `AgentDefs: []AgentSeed{{Name: "coder", CategoryPath: []string{"Engineering", "Tools"}}}` exposes the field on `GET /api/workspaces/:id` and on `agh__workspace_describe`. +- **`make test-e2e-web`** Playwright lane: with categorized fixtures, the sidebar shows the tree, the session-create dialog picker groups agents, and clicking an item navigates to `/agents/:name`. + +### Negative Cases (consolidated) + +The following inputs MUST fail at validation with a stable error message: + +| Input | Surface | Required failure | +|---|---|---| +| `category_path: [""]` | parse + validate | "blank segment" | +| `category_path: [" "]` | parse + validate | "blank segment" | +| `category_path: ["."]` | parse + validate | invalid-segment | +| `category_path: [".."]` | parse + validate | invalid-segment | +| `category_path: ["a/b"]` | parse + validate | must not contain '/' or '\\' | +| `category_path: ["a\\b"]` | parse + validate | must not contain '/' or '\\' | +| `category_path: "Marketing"` (scalar) | strict-yaml decode | non-array value | +| `categories: [Marketing]` (alias) | strict-yaml decode | unknown key | +| `category_path: [Marketing, Sales]` then disk edit retains | `EditAgentDefFile` | preserved verbatim | + +## Development Sequencing + +### Build Order + +1. **`internal/config`**: add field to `AgentDef` + `parsedAgentDef`, normalization helper, validation rule. Wire into `ParseAgentDef`, `AgentDef.Validate`, `EditAgentDefFile`, `validateAgentResourceSpec`. Update `CloneAgentDef`. Land all unit tests (parse, validate, edit-roundtrip, clone, resource). +2. **`internal/workspace`**: replace `cloneAgentDefs` body with `aghconfig.CloneAgentDef`. Add tests for category + skills preservation. +3. **`internal/api/contract` + `internal/api/core`**: add field to `AgentPayload`, `BundleAgentPayload`. Update `AgentPayloadFromDef`. Add tests. +4. **`internal/daemon` + `internal/bundles`**: ensure resource sync, bundle materialization/projection, and bundle activation payload propagate the field. Add/extend integration tests. +5. **`make codegen`** then `make codegen-check` to regenerate `openapi/agh.json` and `web/src/generated/agh-openapi.d.ts`. +6. **`internal/cli`**: add Category column/key to `agent list`, `agent info`, workspace agent views (human, toon, JSON). Add CLI tests across all three formats. +7. **`internal/testutil/e2e.AgentSeed`**: add optional `CategoryPath` and update at least one e2e seed. +8. **`packages/ui`**: export `Tree`, `TreeItem`, `TreeItemLabel`, `TreeDragLine` from `packages/ui/src/index.ts`. +9. **`agh-web` dependency**: `bun add @headless-tree/react@^1.6.3 --filter agh-web`. +10. **`agh-web` agent category lib**: `agent-category.ts` + tests. +11. **`agh-web` shared command components**: `AgentCommandList`, `AgentCommandSelect`, `AgentCommandMultiSelect` + tests + stories. +12. **`agh-web` `AgentCategoryTree`** + tests + story. +13. **`agh-web` consumers**: rewire `AppSidebar`, session-create dialog, settings skills agent scope, and network create-channel dialog. Update each test to drive the picker via keyboard. Update mocks/fixtures. +14. **Docs**: add `category_path` row to AGENT.md reference docs in `packages/site`. Run `make cli-docs` to regenerate CLI reference. Run `cd packages/site && bun run source:generate` if drift is reported. +15. **Final gate**: `make verify` (codegen-check → bun-lint → bun-typecheck → bun-test → web-build → fmt → lint → test → build → boundaries). + +### Technical Dependencies + +- `@headless-tree/react@^1.6.3` must be installed before any `AgentCategoryTree` work compiles. +- `make codegen` must run before any web code that imports `AgentPayload.category_path` from `agh-openapi.d.ts`. +- `packages/ui` exports must land before `agh-web` imports `Tree` family from `@agh/ui`. +- Backend contract changes must land in the same PR as the consumer updates per CLAUDE.md "no partial-surface completions". + +## Monitoring and Observability + +This change adds metadata only and creates no new lifecycle events, hooks, or canonical event types. Specifically: + +- **No new event/log fields.** `category_path` is not added to canonical events; it would couple display metadata to operational telemetry without a use case. +- **No new metrics.** Agent counts/grouping in dashboards remain transport-agnostic. +- **No new alerts.** No threshold or signal depends on category structure. +- **Existing diagnostics still cover the failure modes.** Invalid `category_path` produces a parse/validate error already routed through the existing `AgentDiagnosticPayload` path (`agent_diagnostics`); the malformed agent is surfaced through the existing list endpoint with `category_path: nil`. + +If sidebar tree expansion or command picker latency becomes a concern at scale, web client telemetry already covers component render timing; no AGH-side metric is introduced. + +## Technical Considerations + +### Key Decisions + +- **Decision:** Canonical field is the array `category_path: [..]`. **Rationale:** array preserves segment boundaries unambiguously; matches the structural shape we render. **Trade-offs:** more verbose than a slash string. **Alternatives rejected:** + - Slash-string `"Marketing/Sales"` — rejected because it conflates separator with content (an agent author cannot include `/` in a category name) and forces the parser to invent escape rules. + - `categories: ["Marketing", "Sales"]` (plural alias) — rejected because plural reads as a multi-tag set; `category_path` reads as a single ordered hierarchy. Greenfield-alpha allows only one canonical name; no alias. + - Multi-tag (multiple paths per agent) — out of scope; one path or none. +- **Decision:** Display-only metadata. **Rationale:** runtime moat is in execution, not organization. Branching ACP/scheduling/permissions on category_path would introduce a hidden coupling that future refactors would have to honor. **Trade-offs:** cannot use category for permission scoping or workspace partitioning later without revisiting the contract. +- **Decision:** Backend payload stays flat; web builds the tree. **Rationale:** keeps the OpenAPI contract simple, lets HTTP/UDS/CLI consumers stay agnostic, and lets the web evolve grouping logic without API changes. **Trade-offs:** every web consumer must call the same `buildAgentCategoryTree` helper to stay consistent. +- **Decision:** No `config.toml` key. **Rationale:** there is nothing to enable/disable globally; the feature is per-agent opt-in by editing AGENT.md. **Trade-offs:** no admin override to hide the column; not needed. +- **Decision:** Replace the hand-rolled `cloneAgentDefs` with `aghconfig.CloneAgentDef`. **Rationale:** the hand-rolled clone already silently dropped `Skills`; every new field would re-introduce the same class of bug. **Trade-offs:** the workspace package now depends more heavily on the config clone authority — acceptable per the repo's "single source of truth" composition discipline. +- **Decision:** Use `@headless-tree/react` (matching `@headless-tree/core`) for the sidebar tree. **Rationale:** keyboard navigation, focus management, expand/collapse, and selection are non-trivial; a battle-tested primitive avoids reinventing them. **Trade-offs:** one new transitive web dependency. +- **Decision:** Build single + multi pickers on `cmdk` via `command.tsx` rather than custom `` cannot show category groupings, provider chips, or keyboard search — and our existing custom multi-select is ad-hoc. **Trade-offs:** more component code than a native select, but unifies three call sites onto one shared primitive. + +### Known Risks + +- **Risk:** Forgetting to thread `CategoryPath` through one of the conversion seams (resource codec, bundle materialization, bundle activation payload) silently drops the field for some consumers. **Likelihood:** medium. **Mitigation:** the test matrix covers each seam by name; impact analysis enumerates them; a dedicated `BundleAgentPayload` round-trip test catches the bundle-side regression. +- **Risk:** `EditAgentDefFile` rewrites only the fields explicitly mirrored back to `parsed`. Any new field that is not added to the post-mutate copy is silently lost on the next on-disk write. **Likelihood:** high if we're not deliberate. **Mitigation:** explicit `EditAgentDefFile` round-trip test that toggles `Skills.Disabled` and asserts `category_path` survives; build-order step 1 lands this before consumers depend on the round-trip. +- **Risk:** Replacing native `` agent pickers were replaced by the new shared command components and that grouping, filtering, selection semantics, and existing test IDs are preserved across: + +- Session-create dialog (`AgentCommandSelect`). +- Settings skills agent scope picker (`AgentCommandSelect`). +- Network create-channel dialog (`AgentCommandMultiSelect`). + +--- + +### Preconditions + +- [ ] Web dev server running against an isolated daemon with the same seeded categorized + root-level agents from TC-UI-001. +- [ ] User can reach the session-create dialog, settings skills page, and network create-channel dialog. + +--- + +### Test Steps + +1. **Open the session-create dialog.** + - Input: Click "New session" from an agent route. + - **Expected:** The trigger has `data-testid="session-create-agent-select"` (existing ID preserved). Clicking it opens a popover containing `Command`, `CommandInput`, grouped `CommandList`, `CommandGroup`, `CommandItem`. + +2. **Verify grouping.** + - Input: Inspect the popover. + - **Expected:** + - Folders render as `CommandGroup` headings whose `data-testid` is `agent-command-group-category:${joinedSegments}` (e.g., `agent-command-group-category:Marketing/Sales`). + - Group heading text is the formatted label `Marketing / Sales` (single-space delimited). + - Root-level agents appear under the `Agents` group. + - Each item is `data-testid="agent-command-item-${agent.name}"`. + +3. **Filter via search.** + - Input: Focus the input and type the categorized agent's name. + - **Expected:** Only matching items remain visible; group headings collapse appropriately. Empty search after type-then-clear restores all groups. + +4. **Empty state.** + - Input: Type a string that matches no agent. + - **Expected:** A `CommandEmpty` state renders. + +5. **Single-select close-on-pick.** + - Input: Click the categorized agent. + - **Expected:** Popover closes; trigger now displays the agent name + provider + formatted category label. + +6. **Settings skills agent scope picker.** + - Input: Open settings → Skills → agent scope. + - **Expected:** Trigger has `data-testid="settings-agent-select"` (existing ID preserved). Same grouping behavior as step 2; selection updates the in-page state. + +7. **Network create-channel multi-select.** + - Input: Open network → create channel. + - **Expected:** + - Items have `data-testid="network-agent-option-${agent.name}"` (existing ID preserved). + - Selected items have `data-checked="true"`. + - Popover stays open after selection. + - A selected count is visible. + - Each item shows provider + category metadata. + +8. **Keyboard navigation across all three pickers.** + - Input: `ArrowDown`, `Enter`, `Escape`, `Tab`. + - **Expected:** Focus moves through items, Enter selects, Escape closes the popover, Tab returns focus to the trigger. + +--- + +### Behavioral Evidence + +- Operator journey: choose a categorized agent in three different dialogs and observe identical grouping. +- Cross-surface: all three pickers consume the same `AgentPayload[]` and render the same group structure for a given agent. +- Disruption probe: the empty-search state and the multi-select keep-open behavior prove the new component handles edge cases the native `` agent pickers (session-create, settings skills agent scope, network create-channel) are replaced by `AgentCommandSelect` / `AgentCommandMultiSelect` with grouped headings. +- A categorized agent flows from a fresh AGENT.md through the daemon all the way to the web sidebar and the session-create command picker in one observable Playwright run. + +Key risks and the cases that cover them: + +| Risk | Why it matters | Primary coverage | +| --- | --- | --- | +| `category_path` is silently dropped by a conversion seam (`AgentPayloadFromDef`, bundle materialization, resource codec, workspace clone) | Cross-surface drift makes operators distrust the data model | TC-FUNC-001, TC-INT-001, TC-INT-002, TC-REG-002 | +| `EditAgentDefFile` rewrites disk without preserving `category_path` on unrelated mutations | Corrupts authored intent on every skill toggle | TC-FUNC-002 | +| Validation accepts an unsafe segment (`""`, `.`, `..`, `/`, `\`) or revives an alias (`categories:`, `"Marketing/Sales"`) | Greenfield invariant broken; future migration code becomes inevitable | TC-FUNC-003, TC-REG-001 | +| OpenAPI / generated TS drifts from the contract | Web compiles against stale types and loses the field | TC-INT-003 | +| CLI human/TOON/JSON disagree on category presence or formatting | Agents and operators see different truths from the same source | TC-FUNC-004 | +| Sidebar tree breaks active-agent indication, keyboard navigation, or test IDs | Existing automation and operator muscle memory regress | TC-UI-001 | +| Command pickers regress to flat lists or drop existing `data-testid`s | Session-create, settings, and network dialogs lose grouped semantics | TC-UI-002 | +| Live, end-to-end browser scenario fails for a categorized agent | The whole feature is unusable from the user perspective even when units pass | TC-SCEN-001 | +| Casing or order is mutated anywhere in the pipeline | Authors lose intent (`Marketing` vs `marketing`, parent-before-child is meaningful) | TC-REG-003 | + +## Scope Definition + +In scope: + +- `internal/config` parse, normalization, validation, `EditAgentDefFile` round-trip, `CloneAgentDef`, `validateAgentResourceSpec`. +- `internal/workspace.cloneAgentDefs` delegation to `aghconfig.CloneAgentDef` and the `Skills` regression that fix exposes. +- `internal/api/contract.AgentPayload` and `BundleAgentPayload` `category_path` shape (`omitempty`, defensive copy, diagnostic exclusion). +- `internal/api/core/conversions.AgentPayloadFromDef` and `AgentPayloadFromDiagnostic`. +- `internal/cli` agent and workspace commands across `human`, `toon`, and `json` formats. +- `internal/testutil/e2e/config_seed.go` `AgentSeed.CategoryPath` plumbing for runtime E2E fixtures. +- `internal/extension/manager.go` clone-of-clone removal (the round-3 review fix). +- OpenAPI / generated TS surface (`openapi/agh.json`, `web/src/generated/agh-openapi.d.ts`). +- `packages/ui/src/components/reui/tree.tsx` re-exports plus the new `tree.test.tsx` proving optional-feature guards. +- Web sidebar `AgentCategoryTree`, `AgentCommandSelect`, `AgentCommandMultiSelect`, `AgentCommandList`, agent-category lib, view-model hook, and stories. +- The three command-picker call sites: session-create dialog, settings skills agent scope, network create-channel dialog. +- Playwright `web/e2e/agent-categories.spec.ts` end-to-end coverage. +- Documentation under `packages/site/content/runtime/core/...` (definitions and configuration AGENT.md pages). + +Out of scope: + +- Behavior branching on `category_path` in ACP, scheduling, autonomy, permissions, or workspace partitioning (the feature is display-only by contract). +- Backend tree/group endpoints, denormalized category tables, schema migrations. +- A `config.toml` toggle for category UI; tree-expansion persistence is intentionally local UI state. +- Compatibility shims, aliases, slash-string fallbacks, or `Uncategorized` synthetic folders. +- Marketing site / `packages/site` landing copy. +- Performance characterization beyond what `make verify` covers. + +## Behavioral Scenario Charter + +Startup situation: + +- A fresh QA lab seeded via `agh-qa-bootstrap` with at least one categorized AGENT.md (multi-segment), at least one root-level AGENT.md (no `category_path`), and at least one bundle activation that ships a categorized agent. +- Daemon, HTTP, UDS, and Web surfaces all point at the same isolated `AGH_HOME` and unique daemon port; web dev server reads `AGH_WEB_API_PROXY_TARGET` from the bootstrap manifest. +- Provider home / env policy follows the manifest contract: bound-secret providers use `PROVIDER_HOME`/`PROVIDER_CODEX_HOME`; `native_cli` providers preserve operator `HOME`. + +Operator intent: + +- Author a categorized agent in AGENT.md, observe it in CLI human/JSON/TOON outputs, in the daemon REST/UDS payloads, in the native `agh__workspace_describe` tool, and in the Web sidebar tree and session-create command picker, then route from a sidebar leaf to `/agents/:name` and start a session — without ever seeing a `categories` alias, slash-string, or `Uncategorized` bucket. + +Expected business outcome: + +- The operator can group agents into a hierarchy via metadata only, run agents normally regardless of category, and see consistent grouping/folder/leaf state across CLI, HTTP, UDS, native tools, and the web UI. +- Agents can introspect `category_path` purely through agent-manageable surfaces (`agh agent list -o json`, `agh agent info -o json`, native `workspace_describe`). + +Agent roles: + +| Actor / Agent | Role | Expected behavior | Evidence source | +| --- | --- | --- | --- | +| Operator | Scenario driver | Edits AGENT.md, runs CLI, opens Web, starts a session through the categorized agent. | CLI transcript, browser screenshot, API/UDS responses. | +| Categorized agent | Provider-backed work peer | Runs a session normally; behavior is unchanged by category. | Session events, transcript or blocked-provider boundary. | +| Reviewer / observer agent | Cross-surface verifier | Calls `agh__workspace_describe` and `agh agent info -o json` to confirm `category_path` is present and identical to disk. | Native tool transcript, JSON evidence files. | +| QA harness | Disruption prober | Toggles `Skills.Disabled`, restarts daemon, retries with invalid segments, regenerates codegen. | `make verify`, `make codegen-check`, `make test-e2e-runtime`, `make test-e2e-web`. | + +Live provider / LLM expectations: + +- Release-grade execution should run a provider-backed AGH session with a categorized agent when credentials and local prerequisites are reachable. +- If live provider execution is blocked, QA execution must record the exact provider, credential, binary, or account boundary and still validate every local runtime / CLI / API / UDS / Web / E2E harness surface. +- Mock / `acpmock` evidence remains readiness or regression evidence only; it is not counted as live provider proof. + +Expected artifacts: + +- `.compozy/tasks/agent-categories/qa/verification-report.md` +- CLI / API / Web / UDS / native-tool transcripts under `.compozy/tasks/agent-categories/qa/` +- Browser screenshots showing the sidebar tree and the session-create command picker grouping under `.compozy/tasks/agent-categories/qa/screenshots/` +- A scenario contract file at `.compozy/tasks/agent-categories/qa/scenario-contract.json` listing the minimum agents, channels, surfaces, artifacts-used-later, and disruption probes. +- A behavioral charter at `.compozy/tasks/agent-categories/qa/behavioral-scenario-charter.yaml` consumed by `qa-execution`. +- Bug reports under `.compozy/tasks/agent-categories/qa/issues/BUG-*.md` whenever a journey fails. + +Disruption probes: + +- Toggle `Skills.Disabled` via `EditAgentDefFile` and confirm `category_path` survives on disk. +- Restart the daemon and confirm the categorized agent is still tree-grouped and bundle-activatable. +- Submit a malformed `category_path` (`""`, `"."`, `".."`, `"a/b"`, `"a\\b"`, scalar `"Marketing"`, `categories:` alias) and confirm the daemon emits an `agent_diagnostic` with `category_path: nil` rather than a permissive fallback. +- Resolve the same agent's `category_path` from CLI JSON, native tool, and Web simultaneously; the values must agree byte-for-byte. + +## Test Strategy and Approach + +Smoke readiness checks (entry criteria only): + +- `make verify` is green on the branch HEAD. +- `make codegen-check` confirms `openapi/agh.json` and `web/src/generated/agh-openapi.d.ts` are in sync. +- The bootstrap manifest exposes a daemon port and `AGH_WEB_API_PROXY_TARGET`. +- Smoke checks must not be reported as release-grade proof. + +Release-grade behavioral evidence: + +- Execute P0 functional + integration cases first to lock the contract: `internal/config` parse / edit / clone / resource (TC-FUNC-001..004), workspace clone regression (TC-REG-002), conversion seam parity (TC-INT-001..002), codegen drift gate (TC-INT-003), CLI surface coverage (TC-FUNC-004). +- Execute P0 UI cases: sidebar tree behavior (TC-UI-001), command picker grouping across session/settings/network (TC-UI-002). +- Execute the P0 real-scenario case (TC-SCEN-001): a categorized agent appears in the sidebar tree, groups inside the session-create picker, routes to `/agents/:name`, and starts a session whose state agrees across CLI, API/UDS, and Web. +- Execute P1 regression cases: alias / slash-string rejection (TC-REG-001) and casing / order preservation (TC-REG-003). +- Every P0 case names CLI, API/UDS, and Web evidence side-by-side. Mock/`acpmock` runs are captured separately as harness evidence. +- Every P0 journey runs at least one realistic disruption probe. + +Regression evidence: + +- Re-run `make verify` after the last code or fixture change. +- Re-run `make test-e2e-runtime` and `make test-e2e-web` as targeted behavior harnesses. +- Re-run TC-SCEN-001 after the full gate passes. + +## Environment Requirements + +| Requirement | Expected value | +| --- | --- | +| OS | macOS developer workstation or CI-equivalent Linux runner with AGH prerequisites. | +| Runtime | Go toolchain, Bun workspace dependencies, SQLite with race-test support. | +| Browser | Browser plugin / browser-use for local web validation; approved fallback is `agent-browser`. | +| Daemon isolation | Fresh QA lab by default; unique `AGH_HOME`, daemon ports, and tmux bridge socket paths when concurrency is signaled. | +| Provider homes | `PROVIDER_HOME` / `PROVIDER_CODEX_HOME` for bound-secret providers; preserve operator home for `native_cli`. | +| Web proxy | Export `AGH_WEB_API_PROXY_TARGET` from the bootstrap manifest before `make web-dev`. | +| Output root | `.compozy/tasks/agent-categories/qa/` | + +## Entry Criteria + +- TechSpec is approved and the implementation has shipped (Opus round-4 verdict SHIP, 0 blockers / 0 risks / 0 nits). +- Local `make verify` is green on `agent-categories` branch HEAD. +- A QA bootstrap manifest exists or `qa-execution` records why bootstrap could not be created. +- The QA execution task has access to this plan, the test cases, the scenario contract, and the behavioral charter. + +## Exit Criteria + +- All P0 cases pass or produce bug reports with exact reproduction and evidence. +- 90%+ of P1 cases pass with no critical or high bug left unresolved. +- CLI, HTTP, UDS, native-tool, and Web UI agree on `category_path` for the same agent. +- Live provider-backed behavior is exercised with at least one categorized agent OR the exact blocked provider/tool/credential boundary is documented. +- `make verify`, `make test-e2e-runtime`, and `make test-e2e-web` all pass after the last fix. +- `.compozy/tasks/agent-categories/qa/verification-report.md` includes a QA bootstrap block when a healthy reusable lab remains. +- The strict QA auditor (per `qa-execution` checklist) reports 0 blockers across C4, C5, C8, C9, C10, C11, C14. + +## Execution Matrix + +| ID | Priority | Class | Primary surfaces | Must run before | +| --- | --- | --- | --- | --- | +| SMOKE-001 | P0 | Smoke readiness | `make verify`, `make codegen-check`, manifest sanity | Any P0 case | +| TC-FUNC-001 | P0 | Functional / Go | `internal/config` parse + validate | TC-FUNC-002 | +| TC-FUNC-002 | P0 | Functional / Go | `internal/config.EditAgentDefFile` round-trip | TC-INT-001 | +| TC-FUNC-003 | P0 | Functional / Go | Validation negative cases | TC-INT-001 | +| TC-FUNC-004 | P0 | Functional / CLI | `agh agent list/info`, `agh workspace info` (human/toon/json) | TC-SCEN-001 | +| TC-INT-001 | P0 | Integration / Contract | `AgentPayloadFromDef`, bundle activation, native tools, UDS/HTTP parity | TC-INT-002 | +| TC-INT-002 | P0 | Integration / Resource codec | `validateAgentResourceSpec`, daemon resource sync | TC-SCEN-001 | +| TC-INT-003 | P0 | Integration / Codegen | OpenAPI + generated TS drift gate | TC-UI-001 | +| TC-UI-001 | P0 | UI / Web | Sidebar `AgentCategoryTree` behavior, ancestor expansion, test IDs | TC-SCEN-001 | +| TC-UI-002 | P0 | UI / Web | `AgentCommandSelect`/`MultiSelect` in session, settings, network | TC-SCEN-001 | +| TC-SCEN-001 | P0 | Real Scenario / Playwright | Sidebar + session-create grouping + routing + live session | Final verification | +| TC-REG-001 | P1 | Regression / Validation | Alias and slash-string rejection (parser strict) | Final verification | +| TC-REG-002 | P1 | Regression / Workspace | `cloneAgentDefs` preserves Skills + CategoryPath | Final verification | +| TC-REG-003 | P1 | Regression / Casing-order | Author intent preserved across all surfaces | Final verification | + +## Risk Assessment + +| Risk | Probability | Impact | Mitigation | +| --- | --- | --- | --- | +| Live provider credentials unavailable for the categorized session | Medium | High | Record exact boundary; continue all local surfaces; do not claim live provider proof. | +| Browser dev server points at the default daemon port | Medium | High | Derive `AGH_WEB_API_PROXY_TARGET` from bootstrap manifest. | +| QA lab reuses stale state | Medium | Medium | Fresh lab by default; reuse only the same-session healthy manifest. | +| Hidden conversion seam silently nils `category_path` | Medium | High | TC-INT-001/002 cover each named seam (HTTP, UDS, bundle, native tool, resource codec, diagnostic). | +| Strict-yaml decode regresses to permissive on `categories:` alias or scalar string | Low | High | TC-REG-001 locks the contract with explicit negative tests. | +| OpenAPI codegen drift slips into PR | Medium | High | TC-INT-003 reruns `make codegen` and `make codegen-check` and compares hashes. | +| Existing `data-testid`s removed when replacing native `