diff --git a/.agents/skills/agh-code-guidelines/SKILL.md b/.agents/skills/agh-code-guidelines/SKILL.md new file mode 100644 index 000000000..805e3eb4c --- /dev/null +++ b/.agents/skills/agh-code-guidelines/SKILL.md @@ -0,0 +1,95 @@ +--- +name: agh-code-guidelines +description: >- + Enforces AGH Go code style and concurrency patterns before writing or editing + any production Go file: error wrapping with %w, errors.Is/As only (no + strings.Contains on err.Error), no underscore-discarded errors, slog over + log/fmt, context.Context as first arg, compile-time interface assertions, no + hardcoded config, CLI flag presence detection, whitespace normalization at + CLI boundary, no comments restating WHAT, goroutine ownership and shutdown + via context, no fire-and-forget, no time.Sleep in orchestration. Use whenever + creating or modifying any *.go file under cmd/ or internal/ that is not a + test file. Do not use for *_test.go (use agh-test-conventions), schema + migrations (use agh-schema-migration), or contract changes (use + agh-contract-codegen-coship). +trigger: implicit +--- + +# AGH Code Guidelines + +These are the AGH-specific Go style and concurrency rules. They exist because reviewers will block PRs that violate them, and most violations are caught by lint/CI only after the fact. Activate this skill before writing or editing production Go code so the patterns land correctly the first time. + +Companion skills cover narrower domains: `agh-test-conventions` for tests, `agh-cleanup-failure-paths` for multi-step error returns, `agh-schema-migration` for SQLite changes, `agh-contract-codegen-coship` for contract/OpenAPI edits, `golang-pro` for general Go idiom guidance. Activate those alongside when their domain applies. + +## Procedures + +**Step 1: Identify the Edit Surface** + +1. Confirm the target is a production Go file (`cmd/**` or `internal/**`, not `*_test.go`). +2. Read `references/coding-style.md` for the canonical style rules. +3. Read `references/concurrency-patterns.md` for the canonical concurrency rules. + +**Step 2: Apply Error Discipline** + +1. Wrap every error with context: `fmt.Errorf("operation: %w", err)`. The `%w` verb is mandatory when the caller may need to match the cause. +2. Match errors with `errors.Is` and `errors.As` exclusively. `strings.Contains(err.Error(), ...)` is a blocking violation — replace with sentinel errors or typed errors. +3. Never ignore an error with `_`. Either handle it or write a one-line justification comment explaining why the error is impossible or irrelevant. +4. No `panic()` or `log.Fatal()` in production paths. The only legitimate use is unrecoverable startup failure in `main`. + +**Step 3: Apply Logging and Context Discipline** + +1. Use `log/slog` for every operational log line. `log.Printf`, `fmt.Println`, `fmt.Printf` are forbidden in production paths. +2. Pass `context.Context` as the first argument to any function that crosses a runtime boundary (HTTP handler, UDS handler, DB call, subprocess spawn, network call). +3. Never call `context.Background()` outside `main` or a focused test. Caller-supplied context is the rule. +4. External HTTP calls require an explicit timeout. `http.DefaultClient` is forbidden (also enforced by `agh-cleanup-failure-paths`). + +**Step 4: Apply Type Discipline** + +1. Every new exported type that satisfies an interface gets a compile-time assertion: `var _ Interface = (*Type)(nil)` adjacent to the type definition. Reviewers will block missing assertions. +2. Replace `interface{}` / `any` with the concrete type whenever the type is known statically. +3. No reflection without a written performance justification. +4. No defensive nil-checks after `make(...)`. Lint flags `if x == nil` after `make` as unreachable. + +**Step 5: Apply Configuration Discipline** + +1. Never hardcode operational values. Pull from TOML config (`internal/config`) or expose via functional options (`NewManager(opts ...Option)`). +2. Disable / zero-value semantics must be explicit — document whether `0` means "off" or "use default". +3. Resolution chains (e.g., env → flag → config → default) are documented in code as ordered fallbacks ending in an actionable error. +4. Config lifecycle is part of feature lifecycle: any feature that adds/changes/removes config updates the struct, defaults, validation, examples, `config.toml` docs, and tests in the same change. + +**Step 6: Apply CLI Boundary Discipline** + +1. Distinguish "flag not set" from "flag set to zero value" via `cmd.Flags().Changed(name)` (Cobra) or equivalent. Silently ignoring an explicit flag is a bug. +2. Trim and drop empty entries from string-slice CLI inputs (capabilities, IDs, tags, paths) before sending to the daemon. Whitespace-only strings must not surface as "validation problems". +3. Stable `-o json` / `-o jsonl` are compatibility contracts — do not change their shape without a contract update. + +**Step 7: Apply Concurrency Discipline** + +1. Every goroutine has explicit ownership and shutdown via `context.Context` cancellation. +2. No fire-and-forget goroutines. Track with `sync.WaitGroup` (or equivalent owner-side primitive) and join on shutdown. +3. Long-running loops use `select { case <-ctx.Done(): return; case ... }`. +4. Prefer channels over shared memory with mutexes when practical. `sync.RWMutex` for read-heavy state, `sync.Mutex` for write-heavy. +5. No `time.Sleep()` in orchestration paths — use timers, tickers, or context deadlines. +6. Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by a Manager-owned WaitGroup and joined in Manager shutdown. Never put goroutine-owned channels in a struct field that another goroutine mutates — use a per-run handle. +7. Subprocess managed-stop respects `ctx.Done()` between Shutdown and Wait. Wrap `proc.Wait()` in `select { case <-proc.Done(): case <-ctx.Done(): }`. Process-group signaling helpers live in `internal/procutil`. + +**Step 8: Apply Comment Discipline** + +1. Default to writing no comments. Identifiers carry the WHAT. +2. Comments capture WHY when non-obvious: hidden constraints, invariants, workarounds for a specific bug, behavior that would surprise a reader. +3. Never reference the current task, fix, callers, or issue number in a comment ("used by X", "added for the Y flow", "handles the case from issue #123"). Those rot — they belong in the PR description. +4. No multi-paragraph docstrings. One short line max. + +**Step 9: Pre-Commit Validation** + +1. Run `make lint` for the affected package — zero tolerance for golangci-lint findings. +2. Run `make verify` (fmt → lint → test → boundaries → build) before declaring the edit complete. +3. For race-sensitive packages (`internal/session`, `internal/acp`, `internal/hooks`, `internal/subprocess`, `internal/resources`), reproduce CI locally with `act workflow_dispatch -W .github/workflows/ci.yml -j verify --container-architecture linux/amd64` before claiming success. + +## Error Handling + +- **Existing file already violates the rules:** fix what the current edit touches; flag the rest as pre-existing tech debt in the task body. Do not silently expand scope. +- **`errors.Is` / `errors.As` is impossible because the dependency returns a string:** wrap once at the boundary in a typed error of yours; downstream code matches on your typed error. +- **Reflection genuinely required (codegen, decoder):** keep a written justification adjacent to the reflection call. Lint exception requires a `//nolint:` directive with a reason. +- **`panic` shows up in seemingly-production code:** confirm whether the path is reachable post-`main`. If it is, replace with explicit error return; if it is genuinely unreachable, mark with `// unreachable: ...` and prefer `panic("invariant: ...")` over `log.Fatal`. +- **CLI command silently ignores a flag:** verify with `cmd.Flags().Changed(name)`; if the flag is meaningfully optional, document the resolution chain and emit an explicit `slog` debug line when the default is taken. diff --git a/.agents/skills/agh-code-guidelines/references/coding-style.md b/.agents/skills/agh-code-guidelines/references/coding-style.md new file mode 100644 index 000000000..078c20767 --- /dev/null +++ b/.agents/skills/agh-code-guidelines/references/coding-style.md @@ -0,0 +1,71 @@ +# AGH Coding Style — Canonical Rules + +Verbatim canonical rules. Reviewers will quote these. + +## Errors + +- Wrap with context using `%w`: `fmt.Errorf("operation: %w", err)`. +- Match with `errors.Is` and `errors.As`. **`strings.Contains(err.Error(), …)` is forbidden.** +- Never ignore an error with `_`. Every error is handled or has a written justification on the line. +- No `panic()` or `log.Fatal()` in production paths — only for unrecoverable startup failures inside `main`. + +## Cleanup + +- Pair `defer cancel()` immediately after `WithCancel` / `WithTimeout` / `WithDeadline`. +- Every error-return path that previously created or extended a context, registered a resource, opened a connection, or spawned a subprocess MUST `cancel()`, `Close()`, `Stop()`, or release its lease before returning. See `agh-cleanup-failure-paths` for the full audit pattern. + +## Logging + +- `log/slog` for structured logging. `log.Printf`, `fmt.Println`, `fmt.Printf` are forbidden in operational paths. +- Include correlation keys when relevant: `workspace_id`, `session_id`, `parent_session_id`, `root_session_id`, `agent_name`, `task_id`, `run_id`, `claim_token_hash`, `lease_until`, `workflow_id`, `coordinator_session_id`, `scheduler_reason`, `hook_event`, `hook_name`, `spawn_depth`, `actor_kind`, `actor_id`, `release_reason`. + +## Context + +- `context.Context` is the first argument of any function that crosses a runtime boundary. +- `context.Background()` is forbidden outside `main` and focused tests. +- Detached execution (work that outlives the request) uses `context.WithoutCancel(ctx)`. `WithoutCancel` does NOT preserve deadlines — re-attach with `WithDeadline` if needed. + +## Types and Interfaces + +- New exported types implementing an interface get `var _ Interface = (*Type)(nil)` adjacent to the type. **Mandatory.** +- No `interface{}` / `any` when a concrete type is known. +- No reflection without a written performance justification. +- No defensive `if x == nil` checks after `make(...)`. Lint flags this as unreachable. + +## Configuration + +- Never hardcode operational values. Use TOML config (`internal/config`) or functional options (`NewManager(opts ...Option)`). +- Disable / zero-value semantics must be explicit. Document whether `0` means "off" or "use default". +- Resolution chains (env → flag → config → default) are documented as ordered fallbacks ending in actionable errors. +- Config lifecycle is part of the feature lifecycle: structs, defaults, merge/overlay behavior, validation, examples, `config.toml` docs, generated CLI/site docs, and tests update in the same change. If no config change is needed, the TechSpec says why explicitly. + +## CLI Boundary + +- Distinguish "flag not set" from "flag set to zero value" via `cmd.Flags().Changed(name)` (Cobra) or equivalent. Silently ignoring an explicit flag is a bug. +- String-slice inputs (capabilities, IDs, tags, paths) trim and drop empty entries before sending to the daemon. Whitespace-only strings must not be pushed as "validation problems". +- `-o json` and `-o jsonl` are compatibility contracts. No command aliases (no `done`, no `pass`). +- Operator endpoints MUST NOT infer agent identity from environment variables — that path belongs to `internal/agentidentity` for agent-facing CLI. + +## Comments + +- Default: write no comments. Well-named identifiers carry the WHAT. +- Comments capture WHY when non-obvious: hidden constraints, invariants, workarounds for a specific bug, surprising behavior. +- Never reference the current task, fix, callers, or issue number ("used by X", "added for Y flow", "handles the case from issue #123"). Those rot. +- No multi-paragraph docstrings or multi-line comment blocks. One short line max. + +## Outbound Calls + +- `http.DefaultClient` is forbidden in production paths. +- Every outbound HTTP/network call uses a client with an explicit timeout. +- Drain response bodies (`io.Copy(io.Discard, resp.Body)` then `resp.Body.Close()`) — do not skip the drain. + +## Architecture Discipline (cross-package) + +- Interfaces defined where consumed (Go-style): `session/` defines `AgentDriver`, `acp/` implements it. +- Direct function calls through interfaces. No event bus, no NATS, no reflection-based routing. +- No back-pointers between packages — inject callbacks or interfaces. +- Functional options for constructors: `NewManager(opts ...Option)`. +- Maps for <10 items — no registry interfaces for small collections. +- File-level organization within packages — sub-packages only when complexity justifies it. +- `internal/api/core` is the canonical handler home. REST/UDS endpoints exist as shared `BaseHandlers` methods; HTTP and UDS only choose registration and authentication. No transport-duplicated parsing/validation. +- New `internal/api/*` subpackage requires updating `magefile.go` `Boundaries()` in the same commit (CI-enforceable boundaries prevent import cycles). diff --git a/.agents/skills/agh-code-guidelines/references/concurrency-patterns.md b/.agents/skills/agh-code-guidelines/references/concurrency-patterns.md new file mode 100644 index 000000000..6fc73b5fd --- /dev/null +++ b/.agents/skills/agh-code-guidelines/references/concurrency-patterns.md @@ -0,0 +1,51 @@ +# AGH Concurrency Patterns — Canonical Rules + +Verbatim canonical rules. Reviewers will quote these. Companion skills cover deeper analysis: `deadlock-finder-and-fixer` for race/deadlock investigation, `agh-cleanup-failure-paths` for error-path cancellation discipline. + +## Goroutine Ownership + +- Every goroutine has explicit ownership and shutdown via `context.Context` cancellation. +- No fire-and-forget goroutines. Track with `sync.WaitGroup` or equivalent owner-side primitive and join on shutdown. +- Long-running loops use `select { case <-ctx.Done(): return; case ... }`. Never busy-wait. +- Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by a Manager-owned WaitGroup and joined in Manager shutdown. +- Never put goroutine-owned channels in a struct field that another goroutine mutates — use a per-run handle. + +## Synchronization + +- Prefer channels over shared memory with mutexes when practical. +- `sync.RWMutex` for read-heavy shared state, `sync.Mutex` for write-heavy. +- No `time.Sleep()` in orchestration. Use timers, tickers, or context deadlines. + +## Detached Execution + +- Any work that outlives an HTTP/UDS request — prompts, network channel sends, automation jobs — MUST detach via `context.WithoutCancel(ctx)`. +- Never tie execution lifetime to request lifetime. +- Expose explicit cancel endpoints (e.g., `POST /api/sessions/:id/prompt/cancel`). +- `context.WithoutCancel` does NOT preserve deadlines. Re-attach with `WithDeadline` if needed. +- The writer loop stays bound to the request context — detach the *execution*, not the *response*. + +## Subprocess Supervision + +- Subprocess managed-stop respects `ctx.Done()` between Shutdown and Wait. Wrap `proc.Wait()` in `select { case <-proc.Done(): case <-ctx.Done(): }`. +- Process-group supervision parity: Unix uses process groups, Windows uses forced-exit fallback. Always cross-build with `GOOS=windows GOARCH=amd64 go build` before claiming subprocess work complete. +- Centralize signaling helpers in `internal/procutil`. Do not reinvent process-group signaling per package. + +## Race / cgo + +- `make verify` runs `-race`. Race-enabled tests need `CGO_ENABLED=1`. +- `runRaceEnabledGoCommand` (or equivalent) clones caller env and forces `CGO_ENABLED=1` for race subprocesses. Do not trust ambient env. +- Before claiming `make verify` complete on race-sensitive packages (`internal/session`, `internal/acp`, `internal/hooks`, `internal/subprocess`, `internal/resources`), reproduce locally with `act workflow_dispatch -W .github/workflows/ci.yml -j verify --container-architecture linux/amd64`. + +## Authoritative Primitives (do not replicate) + +- `task.Service.ClaimNextRun` is the canonical claim primitive — no peer package may replicate it. +- Wake / observe / sweep are allowed; claim / own is not. +- The mechanical scheduler does NOT call `ClaimNextRun` directly in MVP. +- Hooks dispatch at the call site that owns the state transition. Never tail event/log tables to fire hooks. + +## Common Failure Modes + +- Goroutine leak on error path: every error return that ran above a `go func()` spawn must signal that goroutine to exit (via `cancel()` or close of an owner-controlled channel). +- Deadlock on shutdown: a goroutine reading from a channel that the owner stopped writing to without closing — close channels you own when shutting down. +- Race on map / slice mutation: take the appropriate mutex, or use `sync.Map` for genuinely concurrent maps. Concurrent slice append without sync is always a bug. +- Lost cancellation: storing `context.Background()` instead of caller-supplied `ctx` breaks deadline propagation. Always thread `ctx` through. diff --git a/.agents/skills/agh-test-conventions/SKILL.md b/.agents/skills/agh-test-conventions/SKILL.md index 652f4f914..cdc9e280f 100644 --- a/.agents/skills/agh-test-conventions/SKILL.md +++ b/.agents/skills/agh-test-conventions/SKILL.md @@ -50,11 +50,23 @@ trigger: implicit 3. Deterministic time: replace `time.Now()` with injected clocks; deterministic IDs use injected ID generators. 4. For new types satisfying interfaces, ensure `var _ Interface = (*Type)(nil)` exists in production code (not in the test file). -**Step 6: Pre-Commit Validation** +**Step 6: Apply Integration / E2E Discipline** + +1. Integration tests live in `*_integration_test.go` with `//go:build integration` at the top. Co-locate them with the package they test — never in a separate `test/` directory. +2. `make test` runs unit only. `make test-integration` adds the `+integration` build tag. `make test-e2e-runtime` is the daemon-side Go harness; `make test-e2e-web` is the browser-side Playwright harness. +3. Use `TestMain` for expensive one-time setup/teardown. +4. Use real dependencies — real SQLite via `t.TempDir()`, mock ACP server as a subprocess (`acpmock`). Avoid in-process fakes when a real subprocess can be wired. +5. Keep integration tests fast enough for CI: ~30s max per package. +6. **E2E tests are part of the runtime contract.** When a runtime contract changes (prompt augmenter, situation context, fixture format), the E2E mock and matchers ship in the same PR. Otherwise tests pass against a stale prompt and fail later. +7. Read `references/test-shape-rules.md` "Integration / E2E" section for additional patterns. + +**Step 7: Pre-Commit Validation** 1. Run `python3 scripts/check-test-conventions.py ` to scan the test file for violations. The script is a regex-based fast check; it complements `make verify`. 2. If the script reports violations, fix them before running `make verify`. 3. After edits, run `go test ./ -count=1 -race` for the affected package, then `make verify`. +4. **`make verify` is the commit gate.** If verification is blocked by an external/branch-side asset issue (missing test fixture, etc.), do NOT commit — report the verified blocker and hold. +5. **Test failures are production bugs.** Fix production code; do not weaken assertions. The only legitimate exception is documenting an INVALID review item with concrete evidence. ## Error Handling diff --git a/.agents/skills/agh-test-conventions/references/test-shape-rules.md b/.agents/skills/agh-test-conventions/references/test-shape-rules.md index f12f83f60..3755289a5 100644 --- a/.agents/skills/agh-test-conventions/references/test-shape-rules.md +++ b/.agents/skills/agh-test-conventions/references/test-shape-rules.md @@ -40,6 +40,15 @@ Verbatim canonical rules. Reviewers will quote these. Stay aligned. - Co-located with the package; no `test/` subdirectory. - `make test` = unit only. `make test-integration` = `+integration`. `make test-e2e-runtime` and `make test-e2e-web` are separate lanes. +## Integration / E2E + +- `TestMain` for expensive one-time setup/teardown. +- Use real dependencies: real SQLite via `t.TempDir()`, mock ACP server as a subprocess (`acpmock`). Prefer subprocess mocks over in-process fakes. +- Keep package runtime ~30s max in CI. +- Heavy E2E (`make test-e2e-nightly`) lives in the release-PR `dry-run` job — never in a cron/schedule workflow. +- E2E tests are part of the runtime contract: when a runtime contract changes (prompt augmenter, situation context, fixture format), the E2E mock and matchers ship in the same PR. +- Replace fragile string-matching with structured metadata. ACP prompt routing in `acpmock` uses typed prompt metadata, not rendered prompt substrings. + ## Mocks - Mock via interfaces, not test-only methods on production types. @@ -61,6 +70,12 @@ Verbatim canonical rules. Reviewers will quote these. Stay aligned. - `make verify` runs `-race`. Race-enabled tests need `CGO_ENABLED=1`. - `runRaceEnabledGoCommand` (or equivalent) clones caller env and forces `CGO_ENABLED=1` for race subprocesses. Do not trust ambient env. +- Linux-Race CI parity: before claiming `make verify` complete on race-sensitive packages (`internal/session`, `internal/acp`, `internal/hooks`, `internal/subprocess`, `internal/resources`), reproduce locally with `act workflow_dispatch -W .github/workflows/ci.yml -j verify --container-architecture linux/amd64`. + +## Commit gate + +- `make verify` is the commit gate. If verification is blocked by an external/branch-side asset issue (missing test fixture, etc.), do NOT commit — report the verified blocker and hold. +- Test failures are production bugs. Fix production code; don't weaken assertions. The only exception is documenting an INVALID review item with concrete evidence. ## E2E follows runtime contract diff --git a/.agents/skills/interface-design/SKILL.md b/.agents/skills/interface-design/SKILL.md deleted file mode 100644 index 9fe89c25f..000000000 --- a/.agents/skills/interface-design/SKILL.md +++ /dev/null @@ -1,391 +0,0 @@ ---- -name: interface-design -description: This skill is for interface design — dashboards, admin panels, apps, tools, and interactive products. NOT for marketing design (landing pages, marketing sites, campaigns). ---- - -# Interface Design - -Build interface design with craft and consistency. - -## Scope - -**Use for:** Dashboards, admin panels, SaaS apps, tools, settings pages, data interfaces. - -**Not for:** Landing pages, marketing sites, campaigns. Redirect those to `/frontend-design`. - ---- - -# The Problem - -You will generate generic output. Your training has seen thousands of dashboards. The patterns are strong. - -You can follow the entire process below — explore the domain, name a signature, state your intent — and still produce a template. Warm colors on cold structures. Friendly fonts on generic layouts. "Kitchen feel" that looks like every other app. - -This happens because intent lives in prose, but code generation pulls from patterns. The gap between them is where defaults win. - -The process below helps. But process alone doesn't guarantee craft. You have to catch yourself. - ---- - -# Where Defaults Hide - -Defaults don't announce themselves. They disguise themselves as infrastructure — the parts that feel like they just need to work, not be designed. - -**Typography feels like a container.** Pick something readable, move on. But typography isn't holding your design — it IS your design. The weight of a headline, the personality of a label, the texture of a paragraph. These shape how the product feels before anyone reads a word. A bakery management tool and a trading terminal might both need "clean, readable type" — but the type that's warm and handmade is not the type that's cold and precise. If you're reaching for your usual font, you're not designing. - -**Navigation feels like scaffolding.** Build the sidebar, add the links, get to the real work. But navigation isn't around your product — it IS your product. Where you are, where you can go, what matters most. A page floating in space is a component demo, not software. The navigation teaches people how to think about the space they're in. - -**Data feels like presentation.** You have numbers, show numbers. But a number on screen is not design. The question is: what does this number mean to the person looking at it? What will they do with it? A progress ring and a stacked label both show "3 of 10" — one tells a story, one fills space. If you're reaching for number-on-label, you're not designing. - -**Token names feel like implementation detail.** But your CSS variables are design decisions. `--ink` and `--parchment` evoke a world. `--gray-700` and `--surface-2` evoke a template. Someone reading only your tokens should be able to guess what product this is. - -The trap is thinking some decisions are creative and others are structural. There are no structural decisions. Everything is design. The moment you stop asking "why this?" is the moment defaults take over. - ---- - -# Intent First - -Before touching code, answer these. Not in your head — out loud, to yourself or the user. - -**Who is this human?** -Not "users." The actual person. Where are they when they open this? What's on their mind? What did they do 5 minutes ago, what will they do 5 minutes after? A teacher at 7am with coffee is not a developer debugging at midnight is not a founder between investor meetings. Their world shapes the interface. - -**What must they accomplish?** -Not "use the dashboard." The verb. Grade these submissions. Find the broken deployment. Approve the payment. The answer determines what leads, what follows, what hides. - -**What should this feel like?** -Say it in words that mean something. "Clean and modern" means nothing — every AI says that. Warm like a notebook? Cold like a terminal? Dense like a trading floor? Calm like a reading app? The answer shapes color, type, spacing, density — everything. - -If you cannot answer these with specifics, stop. Ask the user. Do not guess. Do not default. - -## Every Choice Must Be A Choice - -For every decision, you must be able to explain WHY. - -- Why this layout and not another? -- Why this color temperature? -- Why this typeface? -- Why this spacing scale? -- Why this information hierarchy? - -If your answer is "it's common" or "it's clean" or "it works" — you haven't chosen. You've defaulted. Defaults are invisible. Invisible choices compound into generic output. - -**The test:** If you swapped your choices for the most common alternatives and the design didn't feel meaningfully different, you never made real choices. - -## Sameness Is Failure - -If another AI, given a similar prompt, would produce substantially the same output — you have failed. - -This is not about being different for its own sake. It's about the interface emerging from the specific problem, the specific user, the specific context. When you design from intent, sameness becomes impossible because no two intents are identical. - -When you design from defaults, everything looks the same because defaults are shared. - -## Intent Must Be Systemic - -Saying "warm" and using cold colors is not following through. Intent is not a label — it's a constraint that shapes every decision. - -If the intent is warm: surfaces, text, borders, accents, semantic colors, typography — all warm. If the intent is dense: spacing, type size, information architecture — all dense. If the intent is calm: motion, contrast, color saturation — all calm. - -Check your output against your stated intent. Does every token reinforce it? Or did you state an intent and then default anyway? - ---- - -# Product Domain Exploration - -This is where defaults get caught — or don't. - -Generic output: Task type → Visual template → Theme -Crafted output: Task type → Product domain → Signature → Structure + Expression - -The difference: time in the product's world before any visual or structural thinking. - -## Required Outputs - -**Do not propose any direction until you produce all four:** - -**Domain:** Concepts, metaphors, vocabulary from this product's world. Not features — territory. Minimum 5. - -**Color world:** What colors exist naturally in this product's domain? Not "warm" or "cool" — go to the actual world. If this product were a physical space, what would you see? What colors belong there that don't belong elsewhere? List 5+. - -**Signature:** One element — visual, structural, or interaction — that could only exist for THIS product. If you can't name one, keep exploring. - -**Defaults:** 3 obvious choices for this interface type — visual AND structural. You can't avoid patterns you haven't named. - -## Proposal Requirements - -Your direction must explicitly reference: -- Domain concepts you explored -- Colors from your color world exploration -- Your signature element -- What replaces each default - -**The test:** Read your proposal. Remove the product name. Could someone identify what this is for? If not, it's generic. Explore deeper. - ---- - -# The Mandate - -**Before showing the user, look at what you made.** - -Ask yourself: "If they said this lacks craft, what would they mean?" - -That thing you just thought of — fix it first. - -Your first output is probably generic. That's normal. The work is catching it before the user has to. - -## The Checks - -Run these against your output before presenting: - -- **The swap test:** If you swapped the typeface for your usual one, would anyone notice? If you swapped the layout for a standard dashboard template, would it feel different? The places where swapping wouldn't matter are the places you defaulted. - -- **The squint test:** Blur your eyes. Can you still perceive hierarchy? Is anything jumping out harshly? Craft whispers. - -- **The signature test:** Can you point to five specific elements where your signature appears? Not "the overall feel" — actual components. A signature you can't locate doesn't exist. - -- **The token test:** Read your CSS variables out loud. Do they sound like they belong to this product's world, or could they belong to any project? - -If any check fails, iterate before showing. - ---- - -# Craft Foundations - -## Subtle Layering - -This is the backbone of craft. Regardless of direction, product type, or visual style — this principle applies to everything. You should barely notice the system working. When you look at Vercel's dashboard, you don't think "nice borders." You just understand the structure. The craft is invisible — that's how you know it's working. - -### Surface Elevation - -Surfaces stack. A dropdown sits above a card which sits above the page. Build a numbered system — base, then increasing elevation levels. In dark mode, higher elevation = slightly lighter. In light mode, higher elevation = slightly lighter or uses shadow. - -Each jump should be only a few percentage points of lightness. You can barely see the difference in isolation. But when surfaces stack, the hierarchy emerges. Whisper-quiet shifts that you feel rather than see. - -**Key decisions:** -- **Sidebars:** Same background as canvas, not different. Different colors fragment the visual space into "sidebar world" and "content world." A subtle border is enough separation. -- **Dropdowns:** One level above their parent surface. If both share the same level, the dropdown blends into the card and layering is lost. -- **Inputs:** Slightly darker than their surroundings, not lighter. Inputs are "inset" — they receive content. A darker background signals "type here" without heavy borders. - -### Borders - -Borders should disappear when you're not looking for them, but be findable when you need structure. Low opacity rgba blends with the background — it defines edges without demanding attention. Solid hex borders look harsh in comparison. - -Build a progression — not all borders are equal. Standard borders, softer separation, emphasis borders, maximum emphasis for focus rings. Match intensity to the importance of the boundary. - -**The squint test:** Blur your eyes at the interface. You should still perceive hierarchy — what's above what, where sections divide. But nothing should jump out. No harsh lines. No jarring color shifts. Just quiet structure. - -This separates professional interfaces from amateur ones. Get this wrong and nothing else matters. - -## Infinite Expression - -Every pattern has infinite expressions. **No interface should look the same.** - -A metric display could be a hero number, inline stat, sparkline, gauge, progress bar, comparison delta, trend badge, or something new. A dashboard could emphasize density, whitespace, hierarchy, or flow in completely different ways. Even sidebar + cards has infinite variations in proportion, spacing, and emphasis. - -**Before building, ask:** -- What's the ONE thing users do most here? -- What products solve similar problems brilliantly? Study them. -- Why would this interface feel designed for its purpose, not templated? - -**NEVER produce identical output.** Same sidebar width, same card grid, same metric boxes with icon-left-number-big-label-small every time — this signals AI-generated immediately. It's forgettable. - -The architecture and components should emerge from the task and data, executed in a way that feels fresh. Linear's cards don't look like Notion's. Vercel's metrics don't look like Stripe's. Same concepts, infinite expressions. - -## Color Lives Somewhere - -Every product exists in a world. That world has colors. - -Before you reach for a palette, spend time in the product's world. What would you see if you walked into the physical version of this space? What materials? What light? What objects? - -Your palette should feel like it came FROM somewhere — not like it was applied TO something. - -**Beyond Warm and Cold:** Temperature is one axis. Is this quiet or loud? Dense or spacious? Serious or playful? Geometric or organic? A trading terminal and a meditation app are both "focused" — completely different kinds of focus. Find the specific quality, not the generic label. - -**Color Carries Meaning:** Gray builds structure. Color communicates — status, action, emphasis, identity. Unmotivated color is noise. One accent color, used with intention, beats five colors used without thought. - ---- - -# Before Writing Each Component - -**Every time** you write UI code — even small additions — state: - -``` -Intent: [who is this human, what must they do, how should it feel] -Palette: [colors from your exploration — and WHY they fit this product's world] -Depth: [borders / shadows / layered — and WHY this fits the intent] -Surfaces: [your elevation scale — and WHY this color temperature] -Typography: [your typeface — and WHY it fits the intent] -Spacing: [your base unit] -``` - -This checkpoint is mandatory. It forces you to connect every technical choice back to intent. - -If you can't explain WHY for each choice, you're defaulting. Stop and think. - ---- - -# Design Principles - -## Token Architecture - -Every color in your interface should trace back to a small set of primitives: foreground (text hierarchy), background (surface elevation), border (separation hierarchy), brand, and semantic (destructive, warning, success). No random hex values — everything maps to primitives. - -### Text Hierarchy - -Don't just have "text" and "gray text." Build four levels — primary, secondary, tertiary, muted. Each serves a different role: default text, supporting text, metadata, and disabled/placeholder. Use all four consistently. If you're only using two, your hierarchy is too flat. - -### Border Progression - -Borders aren't binary. Build a scale that matches intensity to importance — standard separation, softer separation, emphasis, maximum emphasis. Not every boundary deserves the same weight. - -### Control Tokens - -Form controls have specific needs. Don't reuse surface tokens — create dedicated ones for control backgrounds, control borders, and focus states. This lets you tune interactive elements independently from layout surfaces. - -## Spacing - -Pick a base unit and stick to multiples. Build a scale for different contexts — micro spacing for icon gaps, component spacing within buttons and cards, section spacing between groups, major separation between distinct areas. Random values signal no system. - -## Padding - -Keep it symmetrical. If one side has a value, others should match unless content naturally requires asymmetry. - -## Depth - -Choose ONE approach and commit: -- **Borders-only** — Clean, technical. For dense tools. -- **Subtle shadows** — Soft lift. For approachable products. -- **Layered shadows** — Premium, dimensional. For cards that need presence. -- **Surface color shifts** — Background tints establish hierarchy without shadows. - -Don't mix approaches. - -## Border Radius - -Sharper feels technical. Rounder feels friendly. Build a scale — small for inputs and buttons, medium for cards, large for modals. Don't mix sharp and soft randomly. - -## Typography - -Build distinct levels distinguishable at a glance. Headlines need weight and tight tracking for presence. Body needs comfortable weight for readability. Labels need medium weight that works at smaller sizes. Data needs monospace with tabular number spacing for alignment. Don't rely on size alone — combine size, weight, and letter-spacing. - -## Card Layouts - -A metric card doesn't have to look like a plan card doesn't have to look like a settings card. Design each card's internal structure for its specific content — but keep the surface treatment consistent: same border weight, shadow depth, corner radius, padding scale. - -## Controls - -Native `` render OS-native elements that cannot be styled. Build custom components — trigger buttons with positioned dropdowns, calendar popovers, styled state management. - -## Iconography - -Icons clarify, not decorate — if removing an icon loses no meaning, remove it. Choose one icon set and stick with it. Give standalone icons presence with subtle background containers. - -## Animation - -Fast micro-interactions, smooth easing. Larger transitions can be slightly longer. Use deceleration easing. Avoid spring/bounce in professional interfaces. - -## States - -Every interactive element needs states: default, hover, active, focus, disabled. Data needs states too: loading, empty, error. Missing states feel broken. - -## Navigation Context - -Screens need grounding. A data table floating in space feels like a component demo, not a product. Include navigation showing where you are in the app, location indicators, and user context. When building sidebars, consider same background as main content with border separation rather than different colors. - -## Dark Mode - -Dark interfaces have different needs. Shadows are less visible on dark backgrounds — lean on borders for definition. Semantic colors (success, warning, error) often need slight desaturation. The hierarchy system still applies, just with inverted values. - ---- - -# Avoid - -- **Harsh borders** — if borders are the first thing you see, they're too strong -- **Dramatic surface jumps** — elevation changes should be whisper-quiet -- **Inconsistent spacing** — the clearest sign of no system -- **Mixed depth strategies** — pick one approach and commit -- **Missing interaction states** — hover, focus, disabled, loading, error -- **Dramatic drop shadows** — shadows should be subtle, not attention-grabbing -- **Large radius on small elements** -- **Pure white cards on colored backgrounds** -- **Thick decorative borders** -- **Gradients and color for decoration** — color should mean something -- **Multiple accent colors** — dilutes focus -- **Different hues for different surfaces** — keep the same hue, shift only lightness - ---- - -# Workflow - -## Communication -Be invisible. Don't announce modes or narrate process. - -**Never say:** "I'm in ESTABLISH MODE", "Let me check system.md..." - -**Instead:** Jump into work. State suggestions with reasoning. - -## Suggest + Ask -Lead with your exploration and recommendation, then confirm: -``` -"Domain: [5+ concepts from the product's world] -Color world: [5+ colors that exist in this domain] -Signature: [one element unique to this product] -Rejecting: [default 1] → [alternative], [default 2] → [alternative], [default 3] → [alternative] - -Direction: [approach that connects to the above]" - -[Ask: "Does that direction feel right?"] -``` - -## If Project Has system.md -Read `.interface-design/system.md` and apply. Decisions are made. - -## If No system.md -1. Explore domain — Produce all four required outputs -2. Propose — Direction must reference all four -3. Confirm — Get user buy-in -4. Build — Apply principles -5. **Evaluate** — Run the mandate checks before showing -6. Offer to save - ---- - -# After Completing a Task - -When you finish building something, **always offer to save**: - -``` -"Want me to save these patterns for future sessions?" -``` - -If yes, write to `.interface-design/system.md`: -- Direction and feel -- Depth strategy (borders/shadows/layered) -- Spacing base unit -- Key component patterns - -### What to Save - -Add patterns when a component is used 2+ times, is reusable across the project, or has specific measurements worth remembering. Don't save one-off components, temporary experiments, or variations better handled with props. - -### Consistency Checks - -If system.md defines values, check against them: spacing on the defined grid, depth using the declared strategy throughout, colors from the defined palette, documented patterns reused instead of reinvented. - -This compounds — each save makes future work faster and more consistent. - ---- - -# Deep Dives - -For more detail on specific topics: -- `references/principles.md` — Code examples, specific values, dark mode -- `references/validation.md` — Memory management, when to update system.md -- `references/critique.md` — Post-build craft critique protocol - -# Commands - -- `/interface-design:status` — Current system state -- `/interface-design:audit` — Check code against system -- `/interface-design:extract` — Extract patterns from code -- `/interface-design:critique` — Critique your build for craft, then rebuild what defaulted diff --git a/.agents/skills/interface-design/references/critique.md b/.agents/skills/interface-design/references/critique.md deleted file mode 100644 index 7db545e48..000000000 --- a/.agents/skills/interface-design/references/critique.md +++ /dev/null @@ -1,67 +0,0 @@ -# Critique - -Your first build shipped the structure. Now look at it the way a design lead reviews a junior's work — not asking "does this work?" but "would I put my name on this?" - ---- - -## The Gap - -There's a distance between correct and crafted. Correct means the layout holds, the grid aligns, the colors don't clash. Crafted means someone cared about every decision down to the last pixel. You can feel the difference immediately — the way you tell a hand-thrown mug from an injection-molded one. Both hold coffee. One has presence. - -Your first output lives in correct. This command pulls it toward crafted. - ---- - -## See the Composition - -Step back. Look at the whole thing. - -Does the layout have rhythm? Great interfaces breathe unevenly — dense tooling areas give way to open content, heavy elements balance against light ones, the eye travels through the page with purpose. Default layouts are monotone: same card size, same gaps, same density everywhere. Flatness is the sound of no one deciding. - -Are proportions doing work? A 280px sidebar next to full-width content says "navigation serves content." A 360px sidebar says "these are peers." The specific number declares what matters. If you can't articulate what your proportions are saying, they're not saying anything. - -Is there a clear focal point? Every screen has one thing the user came here to do. That thing should dominate — through size, position, contrast, or the space around it. When everything competes equally, nothing wins and the interface feels like a parking lot. - ---- - -## See the Craft - -Move close. Pixel-close. - -The spacing grid is non-negotiable — every value a multiple of 4, no exceptions — but correctness alone isn't craft. Craft is knowing that a tool panel at 16px padding feels workbench-tight while the same card at 24px feels like a brochure. The same number can be right in one context and lazy in another. Density is a design decision, not a constant. - -Typography should be legible even squinted. If size is the only thing separating your headline from your body from your label, the hierarchy is too weak. Weight, tracking, and opacity create layers that size alone can't. - -Surfaces should whisper hierarchy. Not thick borders, not dramatic shadows — quiet tonal shifts where you feel the depth without seeing it. Remove every border from your CSS mentally. Can you still perceive the structure through surface color alone? If not, your surfaces aren't working hard enough. - -Interactive elements need life. Every button, link, and clickable region should respond to hover and press. Not dramatically — a subtle shift in background, a gentle darkening. Missing states make an interface feel like a photograph of software instead of software. - ---- - -## See the Content - -Read every visible string as a user would. Not checking for typos — checking for truth. - -Does this screen tell one coherent story? Could a real person at a real company be looking at exactly this data right now? Or does the page title belong to one product, the article body to another, and the sidebar metrics to a third? - -Content incoherence breaks the illusion faster than any visual flaw. A beautifully designed interface with nonsensical content is a movie set with no script. - ---- - -## See the Structure - -Open the CSS and find the lies — the places that look right but are held together with tape. - -Negative margins undoing a parent's padding. Calc() values that exist only as workarounds. Absolute positioning to escape layout flow. Each is a shortcut where a clean solution exists. Cards with full-width dividers use flex column and section-level padding. Centered content uses max-width with auto margins. The correct answer is always simpler than the hack. - ---- - -## Again - -Look at your output one final time. - -Ask: "If they said this lacks craft, what would they point to?" - -That thing you just thought of — fix it. Then ask again. - -The first build was the draft. The critique is the design. diff --git a/.agents/skills/interface-design/references/example.md b/.agents/skills/interface-design/references/example.md deleted file mode 100644 index 665490653..000000000 --- a/.agents/skills/interface-design/references/example.md +++ /dev/null @@ -1,86 +0,0 @@ -# Craft in Action - -This shows how the subtle layering principle translates to real decisions. Learn the thinking, not the code. Your values will differ — the approach won't. - ---- - -## The Subtle Layering Mindset - -Before looking at any example, internalize this: **you should barely notice the system working.** - -When you look at Vercel's dashboard, you don't think "nice borders." You just understand the structure. When you look at Supabase, you don't think "good surface elevation." You just know what's above what. The craft is invisible — that's how you know it's working. - ---- - -## Example: Dashboard with Sidebar and Dropdown - -### The Surface Decisions - -**Why so subtle?** Each elevation jump should be only a few percentage points of lightness. You can barely see the difference in isolation. But when surfaces stack, the hierarchy emerges. This is the Vercel/Supabase way — whisper-quiet shifts that you feel rather than see. - -**What NOT to do:** Don't make dramatic jumps between elevations. That's jarring. Don't use different hues for different levels. Keep the same hue, shift only lightness. - -### The Border Decisions - -**Why rgba, not solid colors?** Low opacity borders blend with their background. A low-opacity white border on a dark surface is barely there — it defines the edge without demanding attention. Solid hex borders look harsh in comparison. - -**The test:** Look at your interface from arm's length. If borders are the first thing you notice, reduce opacity. If you can't find where regions end, increase slightly. - -### The Sidebar Decision - -**Why same background as canvas, not different?** - -Many dashboards make the sidebar a different color. This fragments the visual space — now you have "sidebar world" and "content world." - -Better: Same background, subtle border separation. The sidebar is part of the app, not a separate region. Vercel does this. Supabase does this. The border is enough. - -### The Dropdown Decision - -**Why surface-200, not surface-100?** - -The dropdown floats above the card it emerged from. If both were surface-100, the dropdown would blend into the card — you'd lose the sense of layering. Surface-200 is just light enough to feel "above" without being dramatically different. - -**Why border-overlay instead of border-default?** - -Overlays (dropdowns, popovers) often need slightly more definition because they're floating in space. A touch more border opacity helps them feel contained without being harsh. - ---- - -## Example: Form Controls - -### Input Background Decision - -**Why darker, not lighter?** - -Inputs are "inset" — they receive content, they don't project it. A slightly darker background signals "type here" without needing heavy borders. This is the alternative-background principle. - -### Focus State Decision - -**Why subtle focus states?** - -Focus needs to be visible, but you don't need a glowing ring or dramatic color. A noticeable increase in border opacity is enough for a clear state change. Subtle-but-noticeable — the same principle as surfaces. - ---- - -## Adapt to Context - -Your product might need: -- Warmer hues (slight yellow/orange tint) -- Cooler hues (blue-gray base) -- Different lightness progression -- Light mode (principles invert — higher elevation = shadow, not lightness) - -**The principle is constant:** barely different, still distinguishable. The values adapt to context. - ---- - -## The Craft Check - -Apply the squint test to your work: - -1. Blur your eyes or step back -2. Can you still perceive hierarchy? -3. Is anything jumping out at you? -4. Can you tell where regions begin and end? - -If hierarchy is visible and nothing is harsh — the subtle layering is working. diff --git a/.agents/skills/interface-design/references/principles.md b/.agents/skills/interface-design/references/principles.md deleted file mode 100644 index 6c4a50273..000000000 --- a/.agents/skills/interface-design/references/principles.md +++ /dev/null @@ -1,235 +0,0 @@ -# Core Craft Principles - -These apply regardless of design direction. This is the quality floor. - ---- - -## Surface & Token Architecture - -Professional interfaces don't pick colors randomly — they build systems. Understanding this architecture is the difference between "looks okay" and "feels like a real product." - -### The Primitive Foundation - -Every color in your interface should trace back to a small set of primitives: - -- **Foreground** — text colors (primary, secondary, muted) -- **Background** — surface colors (base, elevated, overlay) -- **Border** — edge colors (default, subtle, strong) -- **Brand** — your primary accent -- **Semantic** — functional colors (destructive, warning, success) - -Don't invent new colors. Map everything to these primitives. - -### Surface Elevation Hierarchy - -Surfaces stack. A dropdown sits above a card which sits above the page. Build a numbered system: - -``` -Level 0: Base background (the app canvas) -Level 1: Cards, panels (same visual plane as base) -Level 2: Dropdowns, popovers (floating above) -Level 3: Nested dropdowns, stacked overlays -Level 4: Highest elevation (rare) -``` - -In dark mode, higher elevation = slightly lighter. In light mode, higher elevation = slightly lighter or uses shadow. The principle: **elevated surfaces need visual distinction from what's beneath them.** - -### The Subtlety Principle - -This is where most interfaces fail. Study Vercel, Supabase, Linear — their surfaces are **barely different** but still distinguishable. Their borders are **light but not invisible**. - -**For surfaces:** The difference between elevation levels should be subtle — a few percentage points of lightness, not dramatic jumps. In dark mode, surface-100 might be 7% lighter than base, surface-200 might be 9%, surface-300 might be 12%. You can barely see it, but you feel it. - -**For borders:** Borders should define regions without demanding attention. Use low opacity (0.05-0.12 alpha for dark mode, slightly higher for light). The border should disappear when you're not looking for it, but be findable when you need to understand the structure. - -**The test:** Squint at your interface. You should still perceive the hierarchy — what's above what, where regions begin and end. But no single border or surface should jump out at you. If borders are the first thing you notice, they're too strong. If you can't find where one region ends and another begins, they're too subtle. - -**Common AI mistakes to avoid:** -- Borders that are too visible (1px solid gray instead of subtle rgba) -- Surface jumps that are too dramatic (going from dark to light instead of dark to slightly-less-dark) -- Using different hues for different surfaces (gray card on blue background) -- Harsh dividers where subtle borders would do - -### Text Hierarchy via Tokens - -Don't just have "text" and "gray text." Build four levels: - -- **Primary** — default text, highest contrast -- **Secondary** — supporting text, slightly muted -- **Tertiary** — metadata, timestamps, less important -- **Muted** — disabled, placeholder, lowest contrast - -Use all four consistently. If you're only using two, your hierarchy is too flat. - -### Border Progression - -Borders aren't binary. Build a scale: - -- **Default** — standard borders -- **Subtle/Muted** — softer separation -- **Strong** — emphasis, hover states -- **Stronger** — maximum emphasis, focus rings - -Match border intensity to the importance of the boundary. - -### Dedicated Control Tokens - -Form controls (inputs, checkboxes, selects) have specific needs. Don't just reuse surface tokens — create dedicated ones: - -- **Control background** — often different from surface backgrounds -- **Control border** — needs to feel interactive -- **Control focus** — clear focus indication - -This separation lets you tune controls independently from layout surfaces. - -### Context-Aware Bases - -Different areas of your app might need different base surfaces: - -- **Marketing pages** — might use darker/richer backgrounds -- **Dashboard/app** — might use neutral working backgrounds -- **Sidebar** — might differ from main canvas - -The surface hierarchy works the same way — it just starts from a different base. - -### Alternative Backgrounds for Depth - -Beyond shadows, use contrasting backgrounds to create depth. An "alternative" or "inset" background makes content feel recessed. Useful for: - -- Empty states in data grids -- Code blocks -- Inset panels -- Visual grouping without borders - ---- - -## Spacing System - -Pick a base unit (4px and 8px are common) and use multiples throughout. The specific number matters less than consistency — every spacing value should be explainable as "X times the base unit." - -Build a scale for different contexts: -- Micro spacing (icon gaps, tight element pairs) -- Component spacing (within buttons, inputs, cards) -- Section spacing (between related groups) -- Major separation (between distinct sections) - -## Symmetrical Padding - -TLBR must match. If top padding is 16px, left/bottom/right must also be 16px. Exception: when content naturally creates visual balance. - -```css -/* Good */ -padding: 16px; -padding: 12px 16px; /* Only when horizontal needs more room */ - -/* Bad */ -padding: 24px 16px 12px 16px; -``` - -## Border Radius Consistency - -Sharper corners feel technical, rounder corners feel friendly. Pick a scale that fits your product's personality and use it consistently. - -The key is having a system: small radius for inputs and buttons, medium for cards, large for modals or containers. Don't mix sharp and soft randomly — inconsistent radius is as jarring as inconsistent spacing. - -## Depth & Elevation Strategy - -Match your depth approach to your design direction. Choose ONE and commit: - -**Borders-only (flat)** — Clean, technical, dense. Works for utility-focused tools where information density matters more than visual lift. Linear, Raycast, and many developer tools use almost no shadows — just subtle borders to define regions. - -**Subtle single shadows** — Soft lift without complexity. A simple `0 1px 3px rgba(0,0,0,0.08)` can be enough. Works for approachable products that want gentle depth. - -**Layered shadows** — Rich, premium, dimensional. Multiple shadow layers create realistic depth. Stripe and Mercury use this approach. Best for cards that need to feel like physical objects. - -**Surface color shifts** — Background tints establish hierarchy without any shadows. A card at `#fff` on a `#f8fafc` background already feels elevated. - -```css -/* Borders-only approach */ ---border: rgba(0, 0, 0, 0.08); ---border-subtle: rgba(0, 0, 0, 0.05); -border: 0.5px solid var(--border); - -/* Single shadow approach */ ---shadow: 0 1px 3px rgba(0, 0, 0, 0.08); - -/* Layered shadow approach */ ---shadow-layered: - 0 0 0 0.5px rgba(0, 0, 0, 0.05), - 0 1px 2px rgba(0, 0, 0, 0.04), - 0 2px 4px rgba(0, 0, 0, 0.03), - 0 4px 8px rgba(0, 0, 0, 0.02); -``` - -## Card Layouts - -Monotonous card layouts are lazy design. A metric card doesn't have to look like a plan card doesn't have to look like a settings card. - -Design each card's internal structure for its specific content — but keep the surface treatment consistent: same border weight, shadow depth, corner radius, padding scale, typography. - -## Isolated Controls - -UI controls deserve container treatment. Date pickers, filters, dropdowns — these should feel like crafted objects. - -**Never use native form elements for styled UI.** Native ``, and similar elements render OS-native dropdowns that cannot be styled. Build custom components instead: - -- Custom select: trigger button + positioned dropdown menu -- Custom date picker: input + calendar popover -- Custom checkbox/radio: styled div with state management - -Custom select triggers must use `display: inline-flex` with `white-space: nowrap` to keep text and chevron icons on the same row. - -## Typography Hierarchy - -Build distinct levels that are visually distinguishable at a glance: - -- **Headlines** — heavier weight, tighter letter-spacing for presence -- **Body** — comfortable weight for readability -- **Labels/UI** — medium weight, works at smaller sizes -- **Data** — often monospace, needs `tabular-nums` for alignment - -Don't rely on size alone. Combine size, weight, and letter-spacing to create clear hierarchy. If you squint and can't tell headline from body, the hierarchy is too weak. - -## Monospace for Data - -Numbers, IDs, codes, timestamps belong in monospace. Use `tabular-nums` for columnar alignment. Mono signals "this is data." - -## Iconography - -Icons clarify, not decorate — if removing an icon loses no meaning, remove it. Choose a consistent icon set and stick with it throughout the product. - -Give standalone icons presence with subtle background containers. Icons next to text should align optically, not mathematically. - -## Animation - -Keep it fast and functional. Micro-interactions (hover, focus) should feel instant — around 150ms. Larger transitions (modals, panels) can be slightly longer — 200-250ms. - -Use smooth deceleration easing (ease-out variants). Avoid spring/bounce effects in professional interfaces — they feel playful, not serious. - -## Contrast Hierarchy - -Build a four-level system: foreground (primary) → secondary → muted → faint. Use all four consistently. - -## Color Carries Meaning - -Gray builds structure. Color communicates — status, action, emphasis, identity. Unmotivated color is noise. Color that reinforces the product's world is character. - -## Navigation Context - -Screens need grounding. A data table floating in space feels like a component demo, not a product. Consider including: - -- **Navigation** — sidebar or top nav showing where you are in the app -- **Location indicator** — breadcrumbs, page title, or active nav state -- **User context** — who's logged in, what workspace/org - -When building sidebars, consider using the same background as the main content area. Rely on a subtle border for separation rather than different background colors. - -## Dark Mode - -Dark interfaces have different needs: - -**Borders over shadows** — Shadows are less visible on dark backgrounds. Lean more on borders for definition. - -**Adjust semantic colors** — Status colors (success, warning, error) often need to be slightly desaturated for dark backgrounds. - -**Same structure, different values** — The hierarchy system still applies, just with inverted values. diff --git a/.agents/skills/interface-design/references/validation.md b/.agents/skills/interface-design/references/validation.md deleted file mode 100644 index 7aa4a696b..000000000 --- a/.agents/skills/interface-design/references/validation.md +++ /dev/null @@ -1,48 +0,0 @@ -# Memory Management - -When and how to update `.interface-design/system.md`. - -## When to Add Patterns - -Add to system.md when: -- Component used 2+ times -- Pattern is reusable across the project -- Has specific measurements worth remembering - -## Pattern Format - -```markdown -### Button Primary -- Height: 36px -- Padding: 12px 16px -- Radius: 6px -- Font: 14px, 500 weight -``` - -## Don't Document - -- One-off components -- Temporary experiments -- Variations better handled with props - -## Pattern Reuse - -Before creating a component, check system.md: -- Pattern exists? Use it. -- Need variation? Extend, don't create new. - -Memory compounds: each pattern saved makes future work faster and more consistent. - ---- - -# Validation Checks - -If system.md defines specific values, check consistency: - -**Spacing** — All values multiples of the defined base? - -**Depth** — Using the declared strategy throughout? (borders-only means no shadows) - -**Colors** — Using defined palette, not random hex codes? - -**Patterns** — Reusing documented patterns instead of creating new? diff --git a/.agents/skills/kb/SKILL.md b/.agents/skills/kb/SKILL.md deleted file mode 100644 index c0a9839cd..000000000 --- a/.agents/skills/kb/SKILL.md +++ /dev/null @@ -1,464 +0,0 @@ ---- -name: kb -description: Comprehensive skill for the `kb` CLI and the Karpathy Knowledge Base pattern. Covers the full KB lifecycle — topic scaffolding, multi-source ingestion (URLs, files, YouTube, bookmarks, codebases), wiki article compilation, cross-article querying with file-back, lint-and-heal passes, QMD indexing, and hybrid search. Also covers codebase-specific analysis via inspect commands for complexity, coupling, blast radius, dead code, circular dependencies, symbol/file lookups, backlinks, and code smells. Use when working with kb CLI commands, knowledge base workflows, code vault generation, code graph analysis, code metrics inspection, wiki compilation, or the ingest-compile-query-lint cycle. Do not use for general code review, linting, formatting, building Go projects, or writing application code. ---- - -# kb CLI and Knowledge Base Pattern - -Build and maintain a self-compiling Obsidian markdown knowledge base using the `kb` CLI. The LLM reads raw sources, writes cross-linked wiki articles, files Q&A results back into the corpus, and runs lint-and-heal passes. The CLI also supports codebase ingestion with deep inspection commands for code quality, architecture health, and symbol relationships. - -Each **topic** lives in its own top-level folder (e.g. `ai-harness/`) with `raw/`, `wiki/`, `outputs/`, `bases/` subtrees plus a topic-level `log.md` and `CLAUDE.md`. All topics share a single Obsidian vault at the repo root. Read `references/architecture.md` for the full rationale and the four-phase pipeline (ingest → compile → query → lint). - -The topic's **`CLAUDE.md`** (symlinked to `AGENTS.md`) is the **schema document** — it tells the LLM the scope, conventions, current articles, and research gaps for that topic. Co-evolve it as the topic matures. - -## Prerequisites - -1. Verify the `kb` binary is available: - ```bash - kb version - ``` -2. For search and index commands, verify QMD is installed: - ```bash - qmd --version - # If missing: npm install -g @tobilu/qmd - ``` -3. Supported source languages for codebase analysis: TypeScript (`.ts`), TSX (`.tsx`), JavaScript (`.js`), JSX (`.jsx`), Go (`.go`). - -## Pattern Overview - -Based on Andrej Karpathy's LLM Wiki pattern, the KB treats the LLM as a **compiler** that reads raw source documents and produces a structured, cross-linked markdown wiki. The four-phase loop: - -1. **Ingest** — Scrape/curate sources via `kb` CLI → `raw/` (immutable staging) -2. **Compile** — LLM reads `raw/`, writes `wiki/concepts/` articles (3000-4000 words, dense wikilinks) -3. **Query** — Q&A against wiki → file answers to `outputs/queries/`, promote strong answers to wiki -4. **Lint** — Automated structural checks + LLM-driven semantic healing - -Read `references/architecture.md` for the full rationale, context-window vs RAG tradeoffs, and multi-topic vault design. - -## Related Skills - -This skill orchestrates several companion skills for the LLM-driven phases: - -- **[obsidian-markdown](https://github.com/pedronauck/skills/tree/main/skills/obsidian-markdown)** — author wiki articles with valid Obsidian Flavored Markdown (wikilinks, callouts, embeds, properties). -- **[obsidian-bases](https://github.com/pedronauck/skills/tree/main/skills/obsidian-bases)** — create `.base` files under `/bases/` for dashboard views, filters, and formulas. -- **[obsidian-cli](https://github.com/pedronauck/skills/tree/main/skills/obsidian-cli)** — interact with the running Obsidian vault from the command line (open notes, search, refresh indexes). - -## kb CLI Quick Reference - -### Topic management - -```bash -kb topic new <domain> # scaffold a new topic -kb topic list # list all topics in the vault -kb topic info <slug> # topic metadata (counts, last log entry) -``` - -### Ingestion (auto-generates frontmatter, auto-appends to log.md) - -```bash -kb ingest url <url> --topic <slug> # scrape a web URL via Firecrawl -kb ingest file <path> --topic <slug> # convert local file (PDF, DOCX, EPUB, HTML, images w/OCR, etc.) -kb ingest youtube <url> --topic <slug> # extract YouTube transcript -kb ingest bookmarks <path> --topic <slug> # ingest a bookmark-cluster markdown file -kb ingest codebase <path> --topic <slug> # analyze a codebase into raw/codebase/ -``` - -### Codebase inspection - -```bash -kb inspect smells [--type <smell-type>] --format json -kb inspect dead-code --format json -kb inspect complexity [--top N] --format json -kb inspect blast-radius [--min N] [--top N] --format json -kb inspect coupling [--unstable] --format json -kb inspect circular-deps --format json -kb inspect symbol <name> --format json -kb inspect file <path> --format json -kb inspect backlinks <name-or-path> --format json -kb inspect deps <name-or-path> --format json -``` - -### Structural linting - -```bash -kb lint [<slug>] [--save] # dead links, orphans, missing sources, format violations, stale content -``` - -### Indexing and search (requires QMD) - -```bash -kb index --topic <slug> # create or update QMD collection -kb search "<query>" --topic <slug> # hybrid BM25 + vector search -kb search "<query>" --lex --topic <slug> # keyword-only search -kb search "<query>" --vec --topic <slug> # vector-only search -``` - -After running `kb ingest` or `kb lint --save`, the CLI auto-appends entries to `<topic>/log.md`. Manual log entries are still needed for compile, query, promote, and split operations (Procedure 5). - -## Command Dispatch - -Map the user's intent to the correct command: - -| Intent | Command | -|--------|---------| -| Scaffold a new topic | `kb topic new <slug> <title> <domain>` | -| List all topics | `kb topic list` | -| Scrape a web URL | `kb ingest url <url> --topic <slug>` | -| Ingest a local file (PDF, DOCX, etc.) | `kb ingest file <path> --topic <slug>` | -| Extract a YouTube transcript | `kb ingest youtube <url> --topic <slug>` | -| Ingest bookmark clusters | `kb ingest bookmarks <path> --topic <slug>` | -| Analyze a codebase | `kb ingest codebase <path> --topic <slug> --progress never` | -| Find code smells | `kb inspect smells --format json` | -| Find dead exports and orphan files | `kb inspect dead-code --format json` | -| Rank functions by complexity | `kb inspect complexity --format json` | -| Find high-impact symbols (blast radius) | `kb inspect blast-radius --min 5 --format json` | -| Find unstable files (coupling) | `kb inspect coupling --unstable --format json` | -| Find circular imports | `kb inspect circular-deps --format json` | -| Look up a specific symbol | `kb inspect symbol <name> --format json` | -| Look up a specific file | `kb inspect file <path> --format json` | -| Find what depends on X (incoming refs) | `kb inspect backlinks <name-or-path> --format json` | -| Find what X depends on (outgoing deps) | `kb inspect deps <name-or-path> --format json` | -| Run structural lint | `kb lint <slug> --save` | -| Index vault for search | `kb index --topic <slug>` | -| Search the knowledge base | `kb search "<query>" --topic <slug> --format json` | - -## Codebase Analysis Workflow - -For codebase-specific analysis, the `kb ingest codebase` command must run before any inspect command. - -**Workflow A -- Code Analysis (no QMD required):** -``` -kb ingest codebase <path> --topic <slug> --> kb inspect <subcommand> -``` - -**Workflow B -- Full Pipeline (requires QMD):** -``` -kb ingest codebase <path> --topic <slug> --> kb index --> kb search <query> -``` - -The vault is stored at `<path>/.kb/vault/<topic-slug>/` by default. Later commands auto-discover this vault by walking up from the current working directory. - -### Ingest a Codebase - -```bash -kb ingest codebase <path> --topic <slug> --progress never -``` - -Always use `--progress never` in agent contexts to prevent TTY progress bars from corrupting stdout. - -Parse the JSON output from stdout to extract key values: -- `topicSlug` -- the topic identifier for later commands -- `vaultPath` -- absolute path to the vault root -- `topicPath` -- absolute path to the topic directory -- `filesScanned`, `filesParsed`, `symbolsExtracted` -- summary statistics -- `diagnostics` -- check for warnings or errors - -Stderr carries structured stage logs. Do not treat stderr content as failure evidence. - -Key flags: -- `--output <dir>` -- override vault root location -- `--topic <slug>` -- override the topic slug -- `--include <pattern>` -- re-include paths that would otherwise be ignored (repeatable) -- `--exclude <pattern>` -- exclude additional paths from scanning (repeatable) -- `--semantic` -- enable semantic analysis when adapters support it - -Read `references/cli-ingest-codebase.md` for the full flag table and output schema. - -### Inspect the Vault - -Run inspect subcommands to analyze code quality and architecture. - -**Shared flags for all inspect subcommands:** -- `--format json` -- always use JSON for programmatic parsing -- `--vault <path>` -- explicit vault root (omit to auto-discover from cwd) -- `--topic <slug>` -- explicit topic slug (omit if only one topic exists) - -#### Tabular Subcommands - -These return a list of rows sorted by the primary metric: - -1. **smells** -- List symbols and files with detected code smells. - ``` - kb inspect smells --format json - kb inspect smells --type high-complexity --format json - ``` - -2. **dead-code** -- List dead exports and orphan files. - ``` - kb inspect dead-code --format json - ``` - -3. **complexity** -- Rank functions/methods by cyclomatic complexity. Default top 20. - ``` - kb inspect complexity --format json - kb inspect complexity --top 50 --format json - ``` - -4. **blast-radius** -- Rank symbols by transitive dependent count. - ``` - kb inspect blast-radius --format json - kb inspect blast-radius --min 10 --top 20 --format json - ``` - -5. **coupling** -- Rank files by instability (Ce / (Ca + Ce)). - ``` - kb inspect coupling --format json - kb inspect coupling --unstable --format json - ``` - -6. **circular-deps** -- List files participating in circular import chains. - ``` - kb inspect circular-deps --format json - ``` - -#### Detail Lookup Subcommands - -These return field-value pairs for a single matched entity: - -7. **symbol \<name\>** -- Case-insensitive substring match. Returns detail fields for a single match, or a summary table for multiple matches. - ``` - kb inspect symbol parseConfig --format json - ``` - -8. **file \<path\>** -- Exact source path lookup. Use the source-relative path as stored in vault frontmatter. - ``` - kb inspect file src/config.ts --format json - ``` - -#### Relation Subcommands - -These return relation edges (`target_path`, `type`, `confidence`): - -9. **backlinks \<name-or-path\>** -- Incoming references. Accepts a symbol name or file path. - ``` - kb inspect backlinks parseConfig --format json - ``` - -10. **deps \<name-or-path\>** -- Outgoing dependencies. Accepts a symbol name or file path. - ``` - kb inspect deps src/config.ts --format json - ``` - -Read `references/cli-inspect.md` for all column schemas and flag details. - -### Index the Vault - -Index the vault content into QMD for search. This step requires QMD on PATH. - -```bash -kb index --topic <slug> -``` - -The command is idempotent: it checks whether the collection already exists and chooses `add` (create) or `update` (refresh) automatically. - -Key flags: -- `--embed` (default true) -- run embedding after syncing files -- `--force-embed` -- force re-embedding all documents -- `--context <text>` -- attach human context to improve search relevance -- `--name <name>` -- override the derived collection name - -Read `references/cli-search-index.md` for the full output schema. - -### Search the Vault - -Search indexed vault content with QMD. Requires a prior `kb index` run. - -```bash -kb search "<query>" --topic <slug> --format json -``` - -**Search modes:** -- Hybrid (default) -- combines lexical and vector search -- Lexical (`--lex`) -- BM25 keyword search only -- Vector (`--vec`) -- embedding-based semantic search - -The `--lex` and `--vec` flags are mutually exclusive. Omit both for hybrid mode. - -Key flags: -- `--limit N` (default 10) -- maximum results -- `--min-score N` -- minimum relevance threshold -- `--full` -- return full document content instead of snippets -- `--all` -- return all matches above the minimum score - -Read `references/cli-search-index.md` for full details. - -## KB Maintenance Procedures - -### Procedure 1: Compile a wiki article - -1. Read `references/compilation-guide.md` to anchor on length, style, wikilink density, and sourcing rules. -2. Identify candidate sources via `kb search "<topic phrase>" --topic <slug>` or read `<topic>/wiki/index/Source Index.md`. -3. Load the candidate raw sources fully into context. -4. Load `<topic>/wiki/index/Concept Index.md` for orientation on existing articles and wikilink targets (including in other topics). -5. **Surface takeaways BEFORE drafting.** Present to the user: 3-5 key takeaways from the sources, the entities/concepts this article will introduce or update, and anything that contradicts existing wiki articles. Ask: *"Anything specific to emphasize or de-emphasize?"* Wait for the response. Skip this step only if the user has explicitly asked for autonomous compilation. -6. Write the article to `<topic>/wiki/concepts/<Article Title>.md` following the [obsidian-markdown skill](https://github.com/pedronauck/skills/tree/main/skills/obsidian-markdown) for wikilink, callout, and frontmatter syntax. Use the frontmatter schema from `references/frontmatter-schemas.md`. Target 3000-4000 words with a Sources section, wikilinks to related articles, and code or diagram blocks where applicable. -7. **Backlink audit -- do not skip.** Grep every existing article in `<topic>/wiki/concepts/` for mentions of the new article's title, aliases, or core entities. For each match, add a `[[New Article]]` wikilink at the first mention (and one later occurrence). This is the step most commonly skipped -- a compounding wiki depends on bidirectional links. - ```bash - grep -rln "<new article title or key term>" <topic>/wiki/concepts/ - ``` -8. Update the topic's indexes (Procedure 2). -9. Update `<topic>/CLAUDE.md` current-articles list. -10. Re-index the topic's collection: `kb index --topic <slug>`. -11. Append an entry to `<topic>/log.md` (Procedure 5) -- e.g., `## [YYYY-MM-DD] compile | <Article Title> (<word_count> words, <N> sources)`. - -When **updating an existing article** (rather than writing new), use the `Current / Proposed / Reason / Source` diff format and contradiction-sweep workflow described in `references/compilation-guide.md`. - -### Procedure 2: Maintain topic indexes - -After adding, renaming, or removing any wiki article: - -1. `<topic>/wiki/index/Dashboard.md` -- update article count, total word count, featured sections, and any Obsidian Base embeds (use the [obsidian-bases skill](https://github.com/pedronauck/skills/tree/main/skills/obsidian-bases) to author `.base` files and embed them). -2. `<topic>/wiki/index/Concept Index.md` -- insert/update the article row alphabetically with its one-line summary. -3. `<topic>/wiki/index/Source Index.md` -- for each new article, append rows for every source it cites, with a wikilink back to the article. -4. Optionally refresh the live view in Obsidian with the [obsidian-cli skill](https://github.com/pedronauck/skills/tree/main/skills/obsidian-cli) (`obsidian open <path>`, `obsidian search <query>`). - -### Procedure 3: Query the wiki and file back the answer - -A query has two phases: **Phase A** produces the answer by reading the wiki (never from general knowledge); **Phase B** files the answer back so the exploration compounds. - -**Precondition:** Identify which topic(s) the question belongs to. If the question spans topics, load each topic's Concept Index. - -#### Phase A -- Answer from the wiki - -1. **Read the topic's Concept Index first** (`<topic>/wiki/index/Concept Index.md`). Scan the full index to identify candidate articles. Do NOT answer from general knowledge -- the wiki is the source of truth, even when the answer seems obvious. A contradiction between the wiki and general knowledge is itself valuable signal. -2. **Locate relevant articles.** At small scale (<30 articles), the index is enough. At larger scale, supplement with `kb search "<phrase>" --topic <slug>`. Also grep the topic for keywords: `grep -rl "<keyword>" <topic>/wiki/concepts/`. -3. **Read the identified articles in full.** Follow one level of `[[wikilinks]]` when targets look relevant to the question. Stop at one hop -- deeper traversal wastes context. -4. **(Optional) Pull in raw sources** if an article's claim is ambiguous and its `sources:` frontmatter points at a specific raw file worth verifying. -5. **Synthesize the answer** with these properties: - - Grounded in the wiki articles you just read -- every factual claim traces back to a `[[Wiki Article]]` citation. - - Notes **agreements and disagreements** between articles when they exist. - - Flags **gaps explicitly**: "The wiki has no article on X" or "[[Article Y]] does not yet cover Z". - - Suggests follow-up **ingest targets** or open questions. -6. **Match format to question type:** - - Factual → prose with inline `[[wikilink]]` citations. - - Comparison → table with rows per alternative, citations in cells. - - How-it-works → numbered steps with citations. - - What-do-we-know-about-X → structured summary with "Known", "Open questions", "Gaps". - - Visual → ASCII/Mermaid diagram, Marp deck (see `references/tooling-tips.md`), or matplotlib chart. - -#### Phase B -- File back the answer - -7. **Save the answer** to `<topic>/outputs/queries/<YYYY-MM-DD> <Question Slug>.md` with frontmatter: `type: output`, `stage: query`, `informed_by: ["[[Article 1]]", "[[Article 2]]"]`. See `references/frontmatter-schemas.md` for the full schema. -8. In the body, list which wiki articles informed the answer under `informed_by:` (as wikilinks) and call out new insights that should be absorbed back into those articles on the next compile pass. -9. When a filed-back insight contradicts or extends an article's claims, **recompile the affected articles** (Procedure 1). -10. **Promote to wiki when the synthesis is durable.** If the answer is a first-class reference (a comparison table, a trade-off analysis, a new concept synthesized from multiple articles), copy it to `<topic>/wiki/concepts/<Title>.md` following Procedure 1 standards and update the indexes (Procedure 2). Karpathy's pattern treats strong query answers as wiki citizens, not secondary artifacts. -11. **Append to `<topic>/log.md`** (Procedure 5) -- e.g., `## [YYYY-MM-DD] query | <Question Slug>` plus a second line `## [YYYY-MM-DD] promote | <Title>` if promoted. - -**Anti-patterns to avoid:** - -- **Answering from memory** -- always read the wiki pages. The wiki may contradict what you think you know. -- **No citations** -- every factual claim must trace back to a `[[wikilink]]`. -- **Skipping the save** -- good query answers compound the wiki's value. Always file to `outputs/queries/`; promote when durable. -- **Silent gaps** -- surface missing coverage explicitly so the next ingest pass can fill it. - -### Procedure 4: Lint and heal - -Run structural lint via the `kb` CLI: - -```bash -kb lint <slug> --save -``` - -This checks dead wikilinks, orphan articles, missing source references, format violations, and stale content, saving a dated report to `<topic>/outputs/reports/`. For each issue, **propose the fix with a diff before applying** -- do not batch-apply changes: - -- **Dead wikilink** -- either create the missing article (Procedure 1) or rewrite the wikilink to point at an existing article. -- **Orphan article** -- add incoming wikilinks from at least one related article, or remove the article if it is outside the topic's scope. -- **Missing source file** -- an article's `sources:` frontmatter references a file absent from `raw/`. Either re-ingest (`kb ingest url/file`) or correct the reference. -- **Stale content** -- article's `updated:` date is older than its source's `scraped:` date. Recompile with current sources. -- **Format violation** -- fix missing frontmatter fields, H1 title, lead paragraph, or Sources section. - -For deeper LLM-driven self-healing checks (inconsistencies across articles, missing coverage, wikilink audits, filed-back query absorption), read `references/lint-procedure.md`. - -After the heal pass, append `## [YYYY-MM-DD] lint | <N> issues found, <M> fixed` to `<topic>/log.md`. - -### Procedure 5: Append to log.md - -The `kb` CLI auto-appends log entries for `ingest` and `lint --save` operations. Manual entries are needed for **compile**, **query**, **promote**, and **split** operations. - -**Format** -- each entry is a single H2 heading with a consistent prefix so the log stays grep-able: - -```markdown -## [YYYY-MM-DD] <op> | <short description> -``` - -Where `<op>` is one of `compile`, `query`, `promote`, or `split` (ingest and lint are handled by `kb`). - -**Examples:** - -```markdown -## [2026-04-04] compile | Transformer Architecture (3847 words, 6 sources) -## [2026-04-04] query | 2026-04-04 flash-attention-vs-paged-attention.md -## [2026-04-04] promote | FlashAttention vs PagedAttention (from query) -## [2026-04-05] split | "Inference Optimization" → KV Cache, Speculative Decoding -``` - -Optionally add a body paragraph under each entry with more context (key findings, source urls, decisions made). Keep entries terse -- the log is for skimming, not prose. - -**Quick recent-activity check** -- the consistent prefix lets unix tools query the log: - -```bash -grep "^## \[" <topic>/log.md | tail -10 # last 10 events -grep "^## \[.*compile" <topic>/log.md | wc -l # total compiles -grep "^## \[2026-04" <topic>/log.md # April 2026 events -``` - -Keep `log.md` at the topic root (not inside `wiki/` or `outputs/`) so it sits alongside `CLAUDE.md` as a first-class topic artifact. - -## Output Format Selection - -All `inspect` and `search` commands support `--format`: -- **json** -- always use for programmatic parsing -- **table** -- human-readable aligned columns (default) -- **tsv** -- tab-separated for piping to Unix tools - -The `ingest codebase` and `index` commands always output JSON to stdout. - -Read `references/output-formats.md` for format examples and empty result handling. - -## Error Handling - -### CLI Errors - -| Error | Recovery | -|-------|----------| -| `unable to find a vault from <path>` | Run `kb ingest codebase <path> --topic <slug>` first | -| `QMD is not available` | Run `npm install -g @tobilu/qmd` | -| `no topics were found` | Run `kb ingest codebase` or `kb topic new` to populate the vault | -| `multiple topics were found` | Re-run with `--topic <slug>` | -| `no symbols matched "<query>"` | Use `inspect smells` or `inspect complexity` to discover valid names | -| `no file matched "<path>"` | Use exact source-relative path from vault frontmatter (e.g. `src/config.ts` not `./src/config.ts`) | - -### KB Workflow Errors - -| Error | Recovery | -|-------|----------| -| `kb` not found | Install the `kb` binary and ensure it is on PATH. Verify with `kb version` | -| Topic not found | Run `kb topic list` to see available topics, or scaffold with `kb topic new` | -| Article exceeds 4000 words | Extract a sub-topic into its own article and wikilink to it | -| Cross-topic wikilink ambiguity | Disambiguate with full path: `[[other-topic/wiki/concepts/Article Name\|Display Name]]` | -| `log.md` missing in existing topic | Create manually and backfill from git: `git log --format='## [%ad] <op> \| %s' --date=short <topic>/` | - -Read `references/error-handling.md` for the full error catalog with causes and recovery steps. - -## Constraints - -### MUST DO -- Run `kb ingest codebase` before any inspect command on that topic -- Use `--format json` when parsing output programmatically -- Use `--progress never` when running `kb ingest codebase` in a non-interactive context -- Parse stdout only for command output; treat stderr as diagnostics -- Use the `topicSlug` from ingest output for subsequent `--topic` flags -- Read `references/compilation-guide.md` before writing wiki articles -- Run backlink audits after every article compile (Procedure 1, step 7) -- File query answers to `outputs/queries/` (Procedure 3) -- Append manual log entries for compile, query, promote, and split operations - -### MUST NOT DO -- Pass both `--lex` and `--vec` to `search` -- Pass `--force-embed` with `--embed=false` to `index` -- Treat stderr content as failure evidence for `kb ingest codebase` -- Assume vault location without running ingest or checking for `.kb/vault/` -- Use relative paths like `./src/config.ts` for `inspect file` -- use `src/config.ts` instead -- Answer wiki queries from general knowledge -- the wiki is the source of truth -- Skip the backlink audit when compiling articles -- Batch-apply lint fixes without proposing diffs first diff --git a/.agents/skills/kb/references/architecture.md b/.agents/skills/kb/references/architecture.md deleted file mode 100644 index 052f08c1c..000000000 --- a/.agents/skills/kb/references/architecture.md +++ /dev/null @@ -1,166 +0,0 @@ -# Architecture and Rationale - -The Karpathy Knowledge Base Pattern treats the LLM as a **compiler** that reads raw source documents and produces a structured, cross-linked markdown wiki. No vector database, no embedding pipeline, no retrieval ranking — the wiki itself is the knowledge base, and at personal scale (~100 articles, ~400K words) it fits entirely in a modern context window. - -Described by Andrej Karpathy in April 2026 in [LLM Wiki: Knowledge Base Pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f), with conceptual roots in Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's unsolved problem was *who does the maintenance*. LLMs solve that: they don't get bored, don't forget cross-references, and can touch 15 files in one pass. - -## Core thesis - -> You never write the wiki. The LLM writes everything. You just steer, and every answer compounds. - -Three converging capabilities enable the pattern: - -1. **1M+ token context windows** let the full wiki load into a single LLM call. -2. **LLM writing quality** is sufficient to produce technically rigorous reference articles. -3. **Markdown + Obsidian** gives inspectable, editable, scriptable, versionable, renderable files with no lock-in. - -The human contributes judgment, taste, and direction. The LLM contributes exhaustive cross-referencing, consistent formatting, tireless compilation, and gap identification. - -## Three-op core vs four-phase extension - -Karpathy's original gist frames the pattern as three operations: **Ingest**, **Query**, **Lint**. In his flow, ingest is active — the LLM reads the source, discusses it, writes a summary page, and updates 10-15 related wiki pages in one pass. "Compile" is folded into ingest. - -This skill **splits ingest into two distinct phases** — `ingest` (scrape + stage into `raw/` immutably) and `compile` (LLM reads `raw/`, writes `wiki/concepts/`) — for three reasons: - -1. **Multi-topic vaults.** A source may arrive weeks before it has enough companions to compile a rigorous 3000-4000-word article. Staging decouples acquisition from synthesis. -2. **Batch scraping.** Tools like firecrawl and tweetsmash-api produce clusters of raw material. Staging them first lets the LLM pick the compile order. -3. **Reproducibility.** `raw/` is immutable — a compiled article can always be re-derived from its sourced files. - -The four-phase loop: - -``` - ┌──────────────┐ - │ 1. INGEST │ Scrape / curate → raw/ (immutable) - └──────┬───────┘ - │ - v - ┌──────────────┐ - │ 2. COMPILE │ LLM reads raw/, writes wiki/concepts/ - └──────┬───────┘ - │ - v - ┌──────────────┐ - │ 3. QUERY │ Q&A against wiki → outputs/queries/, promote strong answers to wiki/ - └──────┬───────┘ - │ - v - ┌──────────────┐ - │ 4. LINT │ Find gaps, fix errors, suggest articles - └──────┬───────┘ - │ - └──────→ back to Phase 1 -``` - -Every phase ends with an append to `<topic>/log.md`. Each phase enhances the next. The cycle runs continuously — the knowledge base is always growing, always improving. - -### Phase 1: Ingest - -Raw source material enters through the `kb` CLI and is staged immutably: - -```bash -kb ingest url <url> --topic <slug> # web articles, blog posts, papers → raw/articles/ -kb ingest file <path> --topic <slug> # local files (PDF, DOCX, EPUB, images w/OCR) → raw/articles/ -kb ingest youtube <url> --topic <slug> # YouTube transcripts → raw/youtube/ -kb ingest bookmarks <path> --topic <slug> # bookmark clusters → raw/bookmarks/ -kb ingest codebase <path> --topic <slug> # codebase analysis → raw/codebase/ -``` - -The CLI auto-generates frontmatter and appends a log entry for each ingest. Principle: capture broadly, filter later. It is better to ingest something irrelevant than to miss something valuable. Never edit files in `raw/` after ingestion — if a source changes, re-scrape as a new version. - -### Phase 2: Compile - -The LLM reads raw sources and produces structured wiki articles: - -1. Load the topic's Concept Index for orientation. -2. Load the target article (if updating). -3. Load relevant raw sources. -4. Write the article with structured sections, `[[wikilinks]]`, code examples, source attributions, technical depth suitable for senior practitioners. -5. Write to `wiki/concepts/<Article Title>.md`. - -Compile foundational articles first so dependent articles can wikilink to them. - -### Phase 3: Query and enhance - -With 1M+ context, load the full wiki (or a relevant subset) and answer complex cross-article queries that would challenge traditional retrieval: - -- "Compare approaches to X across all frameworks discussed." -- "What are the gaps in our coverage of Y?" -- "Synthesize arguments for and against Z." - -**Every answer gets filed back** to `outputs/queries/<YYYY-MM-DD> <slug>.md`. On the next compile pass, insights from filed-back queries get absorbed into the wiki articles themselves. When an answer is strong enough to stand as a first-class reference (a comparison table, a concept synthesized from multiple articles, a novel trade-off analysis), **promote it to `wiki/concepts/`** following Procedure 1 (Compile) standards. Karpathy's pattern treats strong query answers as equal citizens of the wiki, not secondary artifacts. This is the compounding mechanism — explorations become reusable knowledge. - -### Phase 4: Lint and heal - -The `kb` CLI handles automated structural checks: - -```bash -kb lint <slug> --save # dead links, orphans, missing sources, format violations, stale content -``` - -The LLM handles deeper semantic healing that requires reading articles and applying judgment: - -- **Missing coverage** — topics referenced in N articles but lacking their own -- **Inconsistencies** — contradictory claims across articles -- **Filed-back query absorption** — query insights not yet integrated into cited articles - -The lint pass leaves the knowledge base in a better state than it found it. This is the self-healing property. - -## Why markdown + Obsidian - -Five properties: - -- **Inspectable** — plain text, any editor, no opaque database. -- **Editable** — humans can correct errors directly without an API layer. -- **Scriptable** — grep, sed, and programming languages process the corpus trivially. -- **Versionable** — git tracks every change, every compilation can be reviewed. -- **Renderable** — Obsidian gives graph view, backlinks, full-text search, plugin ecosystem. - -No lock-in. If Obsidian disappears the files are still markdown. If the LLM changes the files are still text. - -## Context window vs RAG - -| Concern | RAG | Karpathy KB | -|---------|-----|-------------| -| Retrieval | Embedding + vector DB + ranking | Load into context | -| Relevance | Depends on embedding quality | LLM reads everything relevant | -| Cross-article reasoning | Multi-retrieval with fusion | Natural, all in context | -| Infrastructure | Vector DB, pipeline, tuning | File system + LLM | -| Per-query cost | Low | Higher | -| Answer quality on synthesis | Medium | High | - -Karpathy KB trades higher per-query cost for dramatically simpler infrastructure and higher synthesis quality. For personal/team knowledge bases with moderate query volume and a premium on answer quality, the tradeoff is favorable. For high-volume production (millions of queries/day), traditional RAG remains more cost-effective. - -## Target scale - -A mature knowledge base: 100+ articles, 400K+ words total, dense cross-linking. At this scale, queries produce insights that combine information from articles originally compiled from completely independent raw sources. The knowledge base becomes more than the sum of its inputs. - -## Future direction: knowledge in weights - -The pattern's trajectory: wiki → synthetic QA pairs → QLoRA fine-tune → domain-expert model. The knowledge moves from context into parameters, enabling faster inference and deployable domain expertise. - -## Multi-topic vaults - -Each top-level folder at the vault root is a **topic** — a self-contained subject with its own `raw/`, `wiki/`, `outputs/`, `bases/` subtrees plus `CLAUDE.md` and `log.md` at the topic root. All topics share one Obsidian vault at the root, so cross-topic wikilinks work naturally (e.g., an `ai-harness` article on embeddings can link to a `rust-systems` article on implementation details). Topics stay self-contained in terms of content but contribute to a unified knowledge graph. - -Each topic has its own `CLAUDE.md` (symlinked to `AGENTS.md` for Codex parity) capturing topic-specific scope, current articles, and research gaps — **this IS the schema document** in Karpathy's terminology. The vault-root `CLAUDE.md` captures the shared Karpathy pattern itself. - -## The log.md audit trail - -Every topic carries a `log.md` at its root — an append-only, chronological record of every knowledge-base operation. Each entry is a single H2 heading with a consistent grep-able prefix: - -``` -## [YYYY-MM-DD] <op> | <short description> -``` - -Ops: `ingest`, `compile`, `query`, `promote`, `split`, `lint`. The consistent prefix means unix tools can query the log without special parsing: - -```bash -grep "^## \[" log.md | tail -10 # recent activity -grep "compile" log.md | wc -l # total compiles -``` - -The log is distinct from git history. Git records *what changed in the files*; `log.md` records *what the knowledge base did as a system* — decisions made, insights synthesized, gaps identified. Both coexist. The log is the operational memory; git is the version control. - -## The wiki is a git repo - -The wiki is just a directory of markdown files under git. No database, no server, no API — you get version history, branching, diffs, blame, and collaboration for free. Every compile and lint pass is a reviewable commit. If Obsidian disappears, the files are still markdown. If the LLM changes, the files are still text. This is the no-lock-in guarantee. diff --git a/.agents/skills/kb/references/cli-ingest-codebase.md b/.agents/skills/kb/references/cli-ingest-codebase.md deleted file mode 100644 index e3af252cd..000000000 --- a/.agents/skills/kb/references/cli-ingest-codebase.md +++ /dev/null @@ -1,115 +0,0 @@ -# Ingest Codebase Command Reference - -## Usage - -``` -kb ingest codebase <path> [flags] -``` - -The `<path>` argument is the root directory of the source repository to analyze (required). - -## Flags - -| Flag | Type | Default | Description | -|------|------|---------|-------------| -| `--topic` | string | `""` | Topic slug for the ingested codebase (derived from directory name if omitted) | -| `--output` | string | `""` | Vault root where the generated topic will be written. Defaults to `<path>/.kb/vault` | -| `--title` | string | `""` | Override the generated topic title | -| `--domain` | string | `""` | Override the generated topic domain | -| `--include` | string[] | `nil` | Re-include a path pattern that would otherwise be ignored; repeatable | -| `--exclude` | string[] | `nil` | Exclude an additional path pattern from scanning; repeatable | -| `--semantic` | bool | `false` | Enable semantic analysis when the underlying adapters support it | -| `--progress` | string | `auto` | Progress rendering mode: `auto`, `always`, or `never` | -| `--log-format` | string | `text` | Stderr event format: `text` or `json` | - -## Non-Interactive Usage - -When invoking from an agent context, always set `--progress never` to prevent TTY progress bars from corrupting stdout output. - -``` -kb ingest codebase /path/to/repo --topic my-project --progress never -``` - -## Pipeline Stages - -The codebase ingestion pipeline executes these stages in order: - -1. **scan** -- Discover source files by language -2. **select_adapters** -- Choose language parsers (tree-sitter for TS/JS, Go parser) -3. **parse** -- Extract AST nodes, symbols, and relations -4. **normalize** -- Merge per-file graphs into a unified snapshot, resolve imports -5. **metrics** -- Compute complexity, coupling, blast radius, dead code, smells -6. **render** -- Generate markdown documents and Base definitions -7. **write** -- Persist vault files to disk - -## Supported Languages - -| Language | Extensions | Adapter | -|----------|-----------|---------| -| TypeScript | `.ts` | tree-sitter | -| TSX | `.tsx` | tree-sitter | -| JavaScript | `.js` | tree-sitter | -| JSX | `.jsx` | tree-sitter | -| Go | `.go` | tree-sitter | - -## Output Schema (GenerationSummary) - -The command writes JSON to stdout. Parse the following fields: - -``` -{ - "command": string, // always "generate" - "rootPath": string, // absolute path to the analyzed repository - "vaultPath": string, // absolute path to the vault root - "topicPath": string, // absolute path to the topic directory - "topicSlug": string, // topic identifier (use for --topic in later commands) - "filesScanned": int, // total files discovered - "filesParsed": int, // files successfully parsed - "filesSkipped": int, // files skipped (unsupported or excluded) - "symbolsExtracted": int, // total symbols extracted - "relationsEmitted": int, // total relation edges - "rawDocumentsWritten": int, // per-file markdown documents - "wikiDocumentsWritten": int, // concept wiki articles - "indexDocumentsWritten": int, // index pages - "timings": { - "scanMillis": int, - "selectAdaptersMillis": int, - "parseMillis": int, - "normalizeMillis": int, - "metricsMillis": int, - "renderMillis": int, - "writeMillis": int, - "totalMillis": int - }, - "diagnostics": [ // structured warnings/errors - { - "code": string, - "severity": "warning" | "error", - "stage": "scan" | "parse" | "render" | "write" | "validate", - "message": string, - "filePath": string?, - "language": string?, - "detail": string? - } - ] -} -``` - -## Vault Structure - -After ingestion, the vault directory contains: - -``` -<vaultPath>/<topicSlug>/ - raw-codebase/ # One markdown file per source file with frontmatter and code - wiki-concept/ # Compiled concept articles - wiki-index/ # Index pages for navigation - *.base # Obsidian Base view definitions (YAML) - CLAUDE.md # Topic marker file -``` - -## Default Path Derivation - -- If `--output` is omitted: vault path defaults to `<rootPath>/.kb/vault` -- If `--topic` is omitted: topic slug is derived from the repository directory name -- Full topic path: `<vaultPath>/<topicSlug>/` diff --git a/.agents/skills/kb/references/cli-inspect.md b/.agents/skills/kb/references/cli-inspect.md deleted file mode 100644 index 63f39be82..000000000 --- a/.agents/skills/kb/references/cli-inspect.md +++ /dev/null @@ -1,281 +0,0 @@ -# Inspect Command Reference - -## Usage - -``` -kb inspect <subcommand> [flags] -``` - -## Shared Flags (All Subcommands) - -| Flag | Type | Default | Description | -|------|------|---------|-------------| -| `--format` | string | `table` | Output format: `table`, `json`, or `tsv` | -| `--vault` | string | `""` | Vault root path (auto-discovered from cwd if omitted) | -| `--topic` | string | `""` | Topic slug inside the vault (auto-detected if only one topic exists) | - -## Vault Auto-Discovery - -When `--vault` is omitted, the CLI walks up from the current working directory looking for `.kb/vault/`. If `--topic` is omitted and only one topic exists, it is selected automatically. If multiple topics exist, the command fails with an error listing available slugs. - ---- - -## Subcommands - -### 1. smells - -List symbols and files with detected code smells. - -``` -kb inspect smells [--type <smell-type>] [--format json] -``` - -**Flags:** `--type` (string) -- filter to a specific smell type (e.g., `long-function`, `high-complexity`, `dead-export`, `orphan-file`, `god-file`) - -**Output Columns:** - -| Column | Type | Description | -|--------|------|-------------| -| `kind` | string | `"symbol"` or `"file"` | -| `name` | string | Symbol name or file source path | -| `source_path` | string | Source-relative file path | -| `symbol_kind` | string | Symbol kind (empty for files) | -| `smells` | string[] | List of detected smell types | - ---- - -### 2. dead-code - -List dead exports and orphan files. - -``` -kb inspect dead-code [--format json] -``` - -**Output Columns:** - -| Column | Type | Description | -|--------|------|-------------| -| `kind` | string | `"symbol"` or `"file"` | -| `name` | string | Symbol name or file source path | -| `source_path` | string | Source-relative file path | -| `symbol_kind` | string | Symbol kind (empty for files) | -| `reason` | string | `"dead-export"` or `"orphan-file"` | -| `smells` | string[] | List of detected smell types | - ---- - -### 3. complexity - -Rank functions by cyclomatic complexity (descending). - -``` -kb inspect complexity [--top N] [--format json] -``` - -**Flags:** `--top` (int, default 20) -- maximum number of rows to return - -**Output Columns:** - -| Column | Type | Description | -|--------|------|-------------| -| `symbol_name` | string | Function or method name | -| `symbol_kind` | string | `"function"` or `"method"` | -| `source_path` | string | Source-relative file path | -| `cyclomatic_complexity` | int | Cyclomatic complexity score | -| `loc` | int | Lines of code | -| `blast_radius` | int | Transitive dependents count | -| `smells` | string[] | Detected smell types | - ---- - -### 4. blast-radius - -Rank symbols by blast radius (how many symbols transitively depend on a given symbol). - -``` -kb inspect blast-radius [--min N] [--top N] [--format json] -``` - -**Flags:** -- `--min` (int, default 0) -- minimum blast radius threshold -- `--top` (int, default 0) -- maximum rows to return (0 = all) - -**Output Columns:** - -| Column | Type | Description | -|--------|------|-------------| -| `symbol_name` | string | Symbol name | -| `source_path` | string | Source-relative file path | -| `blast_radius` | int | Count of unique transitive dependents | -| `centrality` | float | Betweenness centrality score (0-1) | -| `external_reference_count` | int | References from outside the symbol's module | -| `smells` | string[] | Detected smell types | - ---- - -### 5. coupling - -Rank files by instability (Martin coupling metric). - -``` -kb inspect coupling [--unstable] [--format json] -``` - -**Flags:** `--unstable` (bool) -- only show files with instability > 0.5 - -**Output Columns:** - -| Column | Type | Description | -|--------|------|-------------| -| `source_path` | string | Source-relative file path | -| `afferent_coupling` | int | Files that import this file (Ca) | -| `efferent_coupling` | int | Files this file imports (Ce) | -| `instability` | float | Ce / (Ca + Ce); 1.0 = completely unstable | -| `has_circular_dependency` | bool | Participates in a circular import chain | -| `smells` | string[] | Detected smell types | - ---- - -### 6. symbol \<name\> - -Lookup symbols by case-insensitive substring match. - -``` -kb inspect symbol <name> [--format json] -``` - -**Behavior:** -- **No matches:** Returns error with suggestion to use `inspect smells` or `inspect complexity` -- **Single match:** Returns detailed field-value pairs (see detail output below) -- **Multiple matches:** Returns summary table - -**Summary Table Columns** (multiple matches): - -| Column | Type | Description | -|--------|------|-------------| -| `symbol_name` | string | Symbol name | -| `symbol_kind` | string | Symbol kind | -| `source_path` | string | Source-relative file path | -| `start_line` | int | Start line in source | -| `language` | string | Source language | -| `smells` | string[] | Detected smell types | - -**Detail Fields** (single match): - -| Field | Type | -|-------|------| -| `relative_path` | string | -| `symbol_name` | string | -| `symbol_kind` | string | -| `source_path` | string | -| `language` | string | -| `exported` | bool | -| `start_line` | int | -| `end_line` | int | -| `signature` | string | -| `loc` | int | -| `blast_radius` | int | -| `centrality` | float | -| `cyclomatic_complexity` | int | -| `external_reference_count` | int | -| `is_dead_export` | bool | -| `is_long_function` | bool | -| `smells` | string[] | -| `outgoing_relations` | relation[] | -| `backlinks` | relation[] | - -Each relation entry has: `target_path` (string), `type` (string: imports|calls|references), `confidence` (string: semantic|syntactic). - ---- - -### 7. file \<path\> - -Lookup a file by its exact source path. - -``` -kb inspect file <path> [--format json] -``` - -**Detail Fields:** - -| Field | Type | -|-------|------| -| `relative_path` | string | -| `source_path` | string | -| `language` | string | -| `symbol_count` | int | -| `symbols` | string[] (name + kind pairs) | -| `afferent_coupling` | int | -| `efferent_coupling` | int | -| `instability` | float | -| `is_orphan_file` | bool | -| `is_god_file` | bool | -| `has_circular_dependency` | bool | -| `smells` | string[] | -| `outgoing_relations` | relation[] | -| `backlinks` | relation[] | - ---- - -### 8. backlinks \<name-or-path\> - -Show incoming references for a symbol or file. - -``` -kb inspect backlinks <name-or-path> [--format json] -``` - -**Entity Resolution:** Tries exact file path match first, falls back to single symbol name match. - -**Output Columns:** - -| Column | Type | Description | -|--------|------|-------------| -| `target_path` | string | Path of the referencing entity | -| `type` | string | Relation type: `imports`, `calls`, `references` | -| `confidence` | string | `semantic` or `syntactic` | - ---- - -### 9. deps \<name-or-path\> - -Show outgoing dependencies for a symbol or file. - -``` -kb inspect deps <name-or-path> [--format json] -``` - -**Entity Resolution:** Same as backlinks (file path first, then symbol name). - -**Output Columns:** - -| Column | Type | Description | -|--------|------|-------------| -| `target_path` | string | Path of the dependency | -| `type` | string | Relation type: `imports`, `calls`, `references` | -| `confidence` | string | `semantic` or `syntactic` | - ---- - -### 10. circular-deps - -List files that participate in circular dependencies. - -``` -kb inspect circular-deps [--format json] -``` - -**Behavior:** -- If cycles exist: returns a table of participating files -- If no cycles: returns `{"message": "no circular dependencies found"}` - -**Output Columns** (when cycles exist): - -| Column | Type | Description | -|--------|------|-------------| -| `source_path` | string | Source-relative file path | -| `afferent_coupling` | int | Files that import this file | -| `efferent_coupling` | int | Files this file imports | -| `instability` | float | Coupling instability metric | -| `smells` | string[] | Detected smell types | diff --git a/.agents/skills/kb/references/cli-search-index.md b/.agents/skills/kb/references/cli-search-index.md deleted file mode 100644 index d5699cae3..000000000 --- a/.agents/skills/kb/references/cli-search-index.md +++ /dev/null @@ -1,163 +0,0 @@ -# Search and Index Command Reference - -Both commands require the QMD binary on PATH. Install with `npm install -g @tobilu/qmd`. - ---- - -## Search Command - -### Usage - -``` -kb search <query> [flags] -``` - -The `<query>` argument is the search text (required, non-empty). - -### Flags - -| Flag | Type | Default | Description | -|------|------|---------|-------------| -| `--lex` | bool | `false` | Use BM25 keyword search only | -| `--vec` | bool | `false` | Use vector similarity search only | -| `--limit` | int | `10` | Maximum number of results to return | -| `--min-score` | float | `0` | Minimum score threshold for returned matches | -| `--full` | bool | `false` | Show the full matched document content instead of snippets | -| `--all` | bool | `false` | Return all matches above the minimum score threshold | -| `--collection` | string | `""` | Use an explicit QMD collection name instead of deriving from the topic | -| `--format` | string | `table` | Output format: `table`, `json`, or `tsv` | -| `--vault` | string | `""` | Vault root path (used when deriving the collection name) | -| `--topic` | string | `""` | Topic slug (used when deriving the collection name) | - -### Search Modes - -| Mode | Flag | QMD Command | Description | -|------|------|-------------|-------------| -| Hybrid | (default) | `query` | Combines lexical and vector search | -| Lexical | `--lex` | `search` | BM25 keyword search only | -| Vector | `--vec` | `vsearch` | Embedding-based semantic search | - -The `--lex` and `--vec` flags are mutually exclusive. Omit both for hybrid mode. - -### Output Columns - -| Column | Type | Description | -|--------|------|-------------| -| `path` | string | Vault-relative path of the matched document | -| `score` | float | Relevance score | -| `preview` | string | Snippet of matched content (or full content if `--full` is set) | - -### Collection Name Derivation - -When `--collection` is omitted, the collection name is derived from the topic slug: -1. Resolve the vault and topic (same logic as inspect commands) -2. Use the `topicSlug` as the collection name - -### Example Invocations - -```bash -# Hybrid search (default) -kb search "authentication middleware" --format json - -# Lexical search with higher result limit -kb search "parseConfig" --lex --limit 20 --format json - -# Vector search with score threshold -kb search "error handling patterns" --vec --min-score 0.5 --format json - -# Full document content -kb search "auth" --full --format json - -# Explicit collection name -kb search "auth" --collection my-project --format json -``` - ---- - -## Index Command - -### Usage - -``` -kb index [flags] -``` - -### Flags - -| Flag | Type | Default | Description | -|------|------|---------|-------------| -| `--vault` | string | `""` | Vault root path | -| `--topic` | string | `""` | Topic slug inside the vault | -| `--name` | string | `""` | Override the derived QMD collection name | -| `--embed` | bool | `true` | Run embedding after syncing files | -| `--force-embed` | bool | `false` | Force re-embedding all documents | -| `--context` | string | `""` | Attach human-written collection context to improve search relevance | - -### Idempotent Behavior - -The index command is idempotent. It checks `qmd status` first and selects the operation: -- If the collection already exists: performs an **update** (syncs changes) -- If the collection does not exist: performs an **add** (creates and populates) - -Run `kb index` repeatedly without side effects. - -### Output Schema (indexResultPayload) - -``` -{ - "collectionName": string, // QMD collection name (= topic slug or --name override) - "embedRequested": bool, // whether --embed was true - "embedResult": { // present only if embedding was performed - "docsProcessed": int, - "chunksEmbedded": int, - "errors": int, - "durationMs": int - }, - "forceEmbed": bool, // whether --force-embed was set - "status": { - "collection": { // null if collection was just created - "name": string, - "path": string, - "pattern": string, - "documents": int, - "lastUpdated": string - }, - "hasVectorIndex": bool, - "needsEmbedding": int, - "totalDocuments": int - }, - "topicPath": string, // absolute path to the topic directory - "topicSlug": string, // topic identifier - "updateResult": { - "collections": int, - "indexed": int, - "updated": int, - "unchanged": int, - "removed": int, - "needsEmbedding": int - }, - "vaultPath": string // absolute path to the vault root -} -``` - -### Example Invocations - -```bash -# Index with default settings (embed enabled) -kb index - -# Index with custom context for search relevance -kb index --context "React application with Redux state management" - -# Force re-embedding all documents -kb index --force-embed - -# Index without embedding (sync files only) -kb index --embed=false - -# Index with explicit vault and topic -kb index --vault /path/to/vault --topic my-project - -# Index with custom collection name -kb index --name custom-collection -``` diff --git a/.agents/skills/kb/references/compilation-guide.md b/.agents/skills/kb/references/compilation-guide.md deleted file mode 100644 index 33231e1e4..000000000 --- a/.agents/skills/kb/references/compilation-guide.md +++ /dev/null @@ -1,137 +0,0 @@ -# Wiki Article Compilation Guide - -Writing standards for articles in `<topic>/wiki/concepts/`. These are the primary output of the knowledge base and the interface the LLM answers queries against. - -## Target characteristics - -- **Length:** 3000-4000 words. Split into sub-articles when exceeded. -- **Audience:** senior practitioners in the topic's field. Assume foundational literacy; explain domain-specific terms on first use. -- **Standalone:** a reader should be able to learn the topic from one article alone. -- **Dense wikilinks:** target 10-30 `[[wikilinks]]` per article, including cross-topic links where relevant. -- **Cited:** every factual claim traces back to a file in `raw/` listed under `sources:` frontmatter. - -## Voice and style - -- **Domain knowledge, not personal wiki.** Write as a reference anyone in the field could use. No "what this means for [person]" sections. No builder profiles. No first-person narration. -- **Declarative, technical, neutral.** Avoid hype. Avoid hedging. State what is true, with sources backing it. -- **Concrete examples.** Prefer code blocks, tables, and diagrams over prose when describing structures, comparisons, or flows. - -## Required sections - -Every wiki article has: - -1. **H1 title** — matches the filename. -2. **Lead paragraph** — 2-4 sentences establishing what the topic is, why it matters, and scoping the article. -3. **Core sections** (H2) — the substantive body. Exact structure depends on the topic but should follow a consistent hierarchy. -4. **Sources and Further Reading** — bulleted list of every cited source plus related wikilinks. - -Optional sections depending on topic: - -- **Comparison tables** — when the article surveys alternatives -- **Code examples** — runnable snippets demonstrating the concepts -- **Architecture diagrams** — ASCII or Mermaid -- **Trade-offs** — explicit pros/cons when the topic has design tensions -- **Future direction** — where the field is heading, if well-established - -## Wikilink density - -Wikilinks are the knowledge graph. Every mention of a related concept should be a wikilink on first occurrence, and ideally a second time in a later section. Examples of good density: - -- Mention of another concept article → `[[Concept Name]]` -- Mention of a protocol, tool, or framework that has its own article → `[[Tool Name]]` -- Cross-topic reference → `[[other-topic/wiki/concepts/Article Name|Display Name]]` - -Do not wikilink every occurrence of common words. Do not wikilink authors or organizations unless they have their own article. - -## Sourcing rules - -- **Every article cites real sources.** Do not write from general knowledge alone. If the corpus does not contain the claim, either ingest a new source (`kb ingest url/file`) or omit the claim. -- **Frontmatter `sources:`** lists every raw file that informed the article, as wikilinks. -- **Inline attributions** are allowed but not required. A Sources section at the bottom is mandatory. -- **Direct quotes** require quotation marks and a source reference. - -## Anti-patterns - -- Articles that summarize a single source — instead, synthesize across multiple sources, or cite the single source as reference and link to the raw file. -- Articles with no incoming wikilinks (orphans) — every article should be reachable via the link graph. -- Articles with no outgoing wikilinks — every article should participate in the graph. -- Prose that could apply to any topic — be specific to this topic's vocabulary, patterns, and tensions. -- Restating prerequisites at length — link to the prerequisite article and move on. - -## When updating an existing article - -1. Load the current article fully. -2. Load any new raw sources that have been added since the last compile. -3. Identify what changed in the sources (new techniques, corrections, new terminology). -4. **Propose each change with a structured diff before writing.** Present to the user: - - > **Current:** `<quote the existing text>` - > - > **Proposed:** `<replacement text>` - > - > **Reason:** `<why this change is warranted>` - > - > **Source:** `<raw/ file path or URL backing the new claim>` - - Always include **Source**. An edit without a source citation creates untraceability — future compile passes won't know why the change was made. Ask for confirmation per page. Do not batch-apply changes. - -5. **Run a contradiction sweep.** If the new information contradicts something in the wiki, the contradicted claim may appear in more than one article. Before rewriting, grep every article for the contradicted claim: - - ```bash - grep -rln "<contradicted claim or key term>" <topic>/wiki/concepts/ - ``` - - Update all occurrences, not just the most obvious one. Silent contradictions across articles are the worst failure mode of a multi-article wiki. - -6. **Check downstream effects.** After identifying the primary article to update, grep for `[[<Article Title>]]` across the topic. For each article that links to the one being updated, ask: *does the update change anything that page asserts?* If yes, flag it explicitly and offer to update it with the same Current/Proposed/Reason/Source flow. - - ```bash - grep -rln "\[\[<Article Title>" <topic>/wiki/concepts/ - ``` - -7. Update the article in place, preserving structure where possible. -8. Bump `updated:` in frontmatter. -9. Add any new `sources:` entries. -10. Check that existing wikilinks still resolve; add new ones for newly-introduced concepts. - -## Backlink audit (compounding bidirectional links) - -After writing or renaming any article, run a backlink audit. A compounding wiki depends on bidirectional links — every new article needs incoming links from articles that mention its concepts. - -**Process:** - -1. Grep the topic's `wiki/concepts/` for mentions of the new article's title, aliases, or core entities: - - ```bash - grep -rln "<new article title or key term>" <topic>/wiki/concepts/ - ``` - -2. For each match, open the file and decide whether the mention warrants a wikilink. Add `[[New Article]]` at the first occurrence, and optionally at a second occurrence in a later section. -3. Skip matches that are inside code blocks or already wikilinked. -4. Skip matches that are incidental (the term appears in a different sense). - -This is the step most commonly skipped when authoring articles. A wiki with one-way links is a blog; a wiki with bidirectional links is a knowledge graph. - -## When to split an article - -Split when any of these hold: - -- Word count exceeds 4000 -- A single H2 section exceeds 1000 words -- The article covers two distinct sub-topics that warrant their own entries -- Multiple other articles would benefit from linking to a sub-section (that sub-section deserves its own article) - -After splitting, update the parent article to wikilink to the new sub-article(s) and update the topic's indexes. - -## When to write a new article (vs extend an existing one) - -Write new when: - -- Three or more existing articles wikilink to the concept as a dead link -- A query answer keeps synthesizing the same cross-article content (that synthesis deserves its own article) -- The topic is a distinct concept with its own sources, patterns, and terminology - -Extend existing when: - -- The new material is a refinement, example, or sub-aspect of an existing concept -- The sub-topic would be under 500 words on its own diff --git a/.agents/skills/kb/references/error-handling.md b/.agents/skills/kb/references/error-handling.md deleted file mode 100644 index e3174803b..000000000 --- a/.agents/skills/kb/references/error-handling.md +++ /dev/null @@ -1,72 +0,0 @@ -# Error Handling Reference - -Categorized error messages from the `kb` CLI with causes and recovery steps. - -## Vault Resolution Errors - -These occur when `inspect`, `search`, or `index` cannot locate a vault or topic. - -| Error Message | Cause | Recovery | -|---------------|-------|----------| -| `unable to find a vault from <path>. walked up looking for .kb/vault/` | No `.kb/vault/` directory exists above the working directory | Run `kb ingest codebase <path> --topic <slug>` first to create the vault | -| `Vault path was not found or is not a directory: <path>` | The `--vault` flag points to a nonexistent path | Verify the vault path exists and is a directory | -| `no topics were found in <path>. expected child directories containing CLAUDE.md` | The vault directory exists but contains no generated topics | Run `kb ingest codebase <path>` or `kb topic new` to populate the vault | -| `multiple topics were found in <path>: <slug1>, <slug2>` | The vault contains more than one topic and no `--topic` flag was provided | Re-run the command with `--topic <slug>` to select one | -| `topic name is required when topic is specified` | The `--topic` flag was provided but with an empty or whitespace-only value | Provide a non-empty topic slug | -| `Topic path was not found or is not a directory: <path>` | The `--topic` slug does not match any directory in the vault | Check available topic slugs inside the vault directory | - -## Inspect Lookup Errors - -These occur when `inspect symbol`, `inspect file`, `inspect backlinks`, or `inspect deps` cannot resolve the target entity. - -| Error Message | Cause | Recovery | -|---------------|-------|----------| -| `no symbols matched "<query>"` | No symbol name contains the query as a case-insensitive substring | Use `kb inspect smells` or `kb inspect complexity` to discover valid symbol names | -| `multiple symbols matched "<query>": <name1>, <name2>` | More than one symbol matched the query | Re-run with a more specific query string | -| `no file matched "<path>"` | No file in the vault has the given `source_path` value | Use the exact source-relative path as stored in vault frontmatter (e.g., `src/config.ts` not `./src/config.ts`) | -| `no symbol or file matched "<query>"` | The query matched neither a file source path nor a symbol name | Re-run with a specific symbol name or an exact source path | - -## QMD Errors - -These occur when `search` or `index` cannot communicate with the QMD binary. - -| Error Message | Cause | Recovery | -|---------------|-------|----------| -| `<command>: QMD is not available to kb. Install it with 'npm install -g @tobilu/qmd' and ensure 'qmd' is on PATH` | The `qmd` binary was not found on the system PATH | Run `npm install -g @tobilu/qmd` and verify with `qmd --version` | -| `<command>: <qmd error details>` | QMD returned an error during execution | Read the stderr diagnostics from QMD for details; common causes include missing collections or corrupted index files | - -## Flag Validation Errors - -These occur before any command execution when flag combinations are invalid. - -| Error Message | Cause | Recovery | -|---------------|-------|----------| -| `choose at most one search mode flag: --lex or --vec` | Both `--lex` and `--vec` were provided to `search` | Use only one mode selector, or omit both for hybrid mode | -| `--force-embed cannot be used together with --embed=false` | Contradictory embedding flags on `index` | Remove `--force-embed` or set `--embed=true` | -| `--limit must be >= 1. received <N>` | The `--limit` flag on `search` was set to zero or negative | Provide a positive integer for `--limit` | -| `--min-score must be >= 0. received <N>` | The `--min-score` flag on `search` was set to a negative value | Provide a non-negative value for `--min-score` | -| `--top must be >= 1. received <N>` | The `--top` flag on `inspect complexity` was set to zero or negative | Provide a positive integer for `--top` | -| `--min must be >= 0. received <N>` | The `--min` flag on `inspect blast-radius` was set to negative | Provide a non-negative integer for `--min` | -| `invalid --format "<value>": expected one of "table", "json", "tsv"` | An unsupported format string was provided | Use `table`, `json`, or `tsv` | - -## KB Workflow Errors - -These occur during knowledge base maintenance operations. - -| Error | Cause | Recovery | -|-------|-------|----------| -| `kb` not found on PATH | The `kb` binary is not installed or not on PATH | Install the `kb` binary and verify with `kb version` | -| Topic not found | The specified topic slug does not exist in the vault | Run `kb topic list` to see available topics, or scaffold with `kb topic new <slug> <title> <domain>` | -| Article exceeds 4000 words | A wiki article has grown beyond the recommended length | Extract a sub-topic into its own article and wikilink to it, rather than padding | -| Cross-topic wikilink ambiguity | Two topics contain articles with the same title | Disambiguate with the full path: `[[other-topic/wiki/concepts/Article Name\|Display Name]]` | -| `log.md` missing in existing topic | The topic was created before `log.md` was standard, or it was accidentally deleted | Create manually and backfill from git: `git log --format='## [%ad] <op> \| %s' --date=short <topic>/` | -| Log entry conflicts with git | Apparent duplication between `log.md` and git history | The log is a human/LLM-readable audit trail, not a replacement for git. Let them coexist: git records *what changed*, `log.md` records *what the knowledge base did* | - -## General Errors - -| Error Message | Cause | Recovery | -|---------------|-------|----------| -| `a search query is required` | Empty or whitespace-only query passed to `search` | Provide a non-empty search query string | -| `a symbol name is required` | Empty query passed to `inspect symbol` | Provide a non-empty symbol name | -| `a file path is required` | Empty path passed to `inspect file` | Provide a non-empty source path | -| `a symbol name or file path is required` | Empty query passed to `inspect backlinks` or `inspect deps` | Provide a non-empty symbol name or file path | diff --git a/.agents/skills/kb/references/frontmatter-schemas.md b/.agents/skills/kb/references/frontmatter-schemas.md deleted file mode 100644 index 114429c34..000000000 --- a/.agents/skills/kb/references/frontmatter-schemas.md +++ /dev/null @@ -1,165 +0,0 @@ -# Frontmatter Schemas - -All notes in the vault use YAML frontmatter for metadata. The subfolder path identifies the topic; the `domain` field is a shortcut for Bases and qmd queries. - -Conventions: - -- `domain: <short-slug>` identifies the topic (e.g., `ai` for `ai-harness/`). -- `created` and `updated` use ISO date format `YYYY-MM-DD`. -- `tags` always include the domain plus the note type plus topic-specific tags. -- `sources` entries are wikilinks pointing at files in `raw/`. - ---- - -## Wiki article — `<topic>/wiki/concepts/<Article Title>.md` - -```yaml ---- -title: Article Title -type: wiki -stage: compiled -domain: <topic-domain> -tags: - - <topic-domain> - - wiki - - topic-specific-tag - - another-topic-tag -created: YYYY-MM-DD -updated: YYYY-MM-DD -sources: - - "[[Source File Name]]" - - "[[Another Source]]" ---- -``` - -## Raw article — `<topic>/raw/articles/<slug>.md` - -```yaml ---- -title: Descriptive Title -type: source -stage: raw -domain: <topic-domain> -source_kind: article -source_url: https://example.com/article -scraped: YYYY-MM-DD -tags: - - <topic-domain> - - raw - - topic-specific-tag ---- -``` - -`source_kind` values: `article`, `github-readme`, `documentation`, `paper`, `blog-post`, `whitepaper`. - -## GitHub README — `<topic>/raw/github/<slug>.md` - -```yaml ---- -title: Repository or Doc Title -type: source -stage: raw -domain: <topic-domain> -source_kind: github-readme -source_url: https://github.com/owner/repo -scraped: YYYY-MM-DD -tags: - - <topic-domain> - - raw - - github - - topic-specific-tag ---- -``` - -## Bookmark cluster — `<topic>/raw/bookmarks/<Topic> Bookmarks <Subtopic>.md` - -```yaml ---- -title: <Topic> Bookmarks <Subtopic> -type: source -stage: raw -domain: <topic-domain> -source_kind: bookmark-cluster -status: seeded -created: YYYY-MM-DD -updated: YYYY-MM-DD -source_urls: - - https://twitter.com/user/status/123 - - https://twitter.com/user/status/456 -tags: - - <topic-domain> - - bookmarks - - raw - - topic-specific-tag ---- -``` - -`status` values: `seeded`, `enriched`, `archived`. - -## Research output — `<topic>/outputs/queries/<YYYY-MM-DD> <slug>.md` - -```yaml ---- -title: Output Title -type: output -stage: query -domain: <topic-domain> -tags: - - <topic-domain> - - output - - query - - topic-specific-tag -created: YYYY-MM-DD -updated: YYYY-MM-DD -informed_by: - - "[[Wiki Article 1]]" - - "[[Wiki Article 2]]" ---- -``` - -`stage` values for outputs: `briefing`, `query`, `diagram`, `lint-report`. - -## Lint report — `<topic>/outputs/reports/<YYYY-MM-DD>-lint.md` - -```yaml ---- -title: Lint Report YYYY-MM-DD -type: output -stage: lint-report -domain: <topic-domain> -tags: - - <topic-domain> - - output - - lint-report -created: YYYY-MM-DD -issues_found: N -issues_fixed: M ---- -``` - -## Topic index — Dashboard / Concept Index / Source Index - -These files are human-browsed hubs, not research notes. Keep frontmatter minimal: - -```yaml ---- -title: Dashboard -type: index -domain: <topic-domain> -updated: YYYY-MM-DD ---- -``` - -## Quick reference - -| File type | Path | type | stage | -|-----------|------|------|-------| -| Wiki article | `wiki/concepts/` | `wiki` | `compiled` | -| Raw article | `raw/articles/` | `source` | `raw` | -| Raw GitHub | `raw/github/` | `source` | `raw` | -| Raw bookmarks | `raw/bookmarks/` | `source` | `raw` | -| Briefing | `outputs/briefings/` | `output` | `briefing` | -| Query result | `outputs/queries/` | `output` | `query` | -| Diagram | `outputs/diagrams/` | `output` | `diagram` | -| Lint report | `outputs/reports/` | `output` | `lint-report` | -| Index | `wiki/index/` | `index` | — | diff --git a/.agents/skills/kb/references/lint-procedure.md b/.agents/skills/kb/references/lint-procedure.md deleted file mode 100644 index ff5526c9c..000000000 --- a/.agents/skills/kb/references/lint-procedure.md +++ /dev/null @@ -1,118 +0,0 @@ -# Lint and Heal Procedure - -Run `kb lint <slug> --save` for automated structural checks (dead wikilinks, orphans, missing sources, format violations, stale content). The report is saved to `<topic>/outputs/reports/` and a log entry is auto-appended. - -This document covers the deeper **LLM-driven checks** that require reading articles and applying judgment. Run them periodically or after a batch of new content. - -### Check 1: Stale content - -For each article: - -1. Read the article's `updated:` date and `sources:` entries. -2. Check each source file's `scraped:` date (or file mtime). -3. If any source is newer than the article, flag it for recompilation. -4. Also flag articles where the topic has evolved rapidly (e.g., LLM model names, protocol versions) and the article has not been updated in 30+ days. - -### Check 2: Inconsistencies across articles - -Load groups of related articles (identified via shared tags or wikilinks) and check for: - -- Contradictory factual claims (e.g., "H100 has 80GB HBM3" vs "H100 has 80GB HBM2e") -- Inconsistent terminology (same concept called two different names across articles) -- Inconsistent formatting (some articles use tables, others prose, for the same kind of comparison) - -Fix by picking the correct/canonical version and updating all affected articles. - -### Check 3: Missing coverage - -Scan all articles for wikilinks and identify targets that: - -- Are referenced in 3+ articles -- Do not have their own article yet - -These are strong candidates for new articles. For each: - -1. Check whether relevant raw sources exist in `raw/`. -2. If yes, write the article (Procedure 1 in SKILL.md). -3. If no, ingest sources first (`kb ingest url/file`) or mark as a research gap in the topic's `CLAUDE.md`. - -### Check 4: Format violations - -Verify each article has: - -- H1 title matching filename -- Lead paragraph -- Sources section at the bottom -- At least 5 wikilinks (outgoing) -- Frontmatter with all required fields (the `kb` CLI validates these automatically via `kb lint`) - -Fix by rewriting or adding the missing elements. - -### Check 5: Wikilink audit - -For each article: - -- Identify concepts mentioned without wikilinks that should have them -- Identify over-wikilinking (same term linked multiple times in close proximity) -- Identify wikilinks to concepts that no longer match the linked article's actual content - -### Check 6: Filed-back query absorption - -Scan `<topic>/outputs/queries/` for recent query results. For each: - -1. Identify the wiki articles listed under `informed_by:`. -2. Check whether the synthesis in the query result adds new insights not yet in those articles. -3. If yes, flag the articles for updates and absorb the insights on the next compile pass. - -This is the core compounding mechanism — query answers feeding back into the wiki. - -## Lint report format - -When running a manual lint pass, produce a report like: - -``` -LINT REPORT — <topic>/ — YYYY-MM-DD - -DEAD LINKS (N) - - [[Missing Article]] referenced in: Foo.md, Bar.md - → SUGGEST: Create wiki/concepts/Missing Article.md - → POTENTIAL SOURCES: raw/articles/relevant-source.md - -ORPHAN ARTICLES (N) - - Token Economics.md — 0 incoming links - → SUGGEST: Add refs from Agent Infrastructure.md, Fine-Tuning.md - -STALE CONTENT (N) - - MCP article references "MCP spec v0.9" but raw/articles/mcp-spec.md is v1.2 - → UPDATE: Recompile with current spec - -INCONSISTENCIES (N) - - Hardware specs disagree: Agent Infrastructure.md vs Fine-Tuning.md - → RESOLVE: Verify against authoritative source, pick canonical - -MISSING COVERAGE (N) - - "Inference Optimization" referenced in 4 articles, no article exists - → SUGGEST: Create wiki/concepts/Inference Optimization.md - -FORMAT VIOLATIONS (N) - - Prompt Engineering Techniques.md — missing Sources section - -FILED-BACK INSIGHTS (N) - - outputs/queries/2026-04-02 memory vs context.md has synthesis not in Memory Systems.md - → ABSORB: Update Memory Systems.md with the compaction tradeoffs insight -``` - -## Heal workflow - -For each issue the lint report surfaces: - -1. **Dead link + source available** → create the article (Procedure 1). -2. **Dead link + no source** → mark in topic `CLAUDE.md` research gaps, or rewrite the link. -3. **Orphan** → add incoming wikilinks, or delete if out-of-scope. -4. **Stale** → re-scrape source, recompile article. -5. **Inconsistency** → find authoritative source, fix all affected articles. -6. **Missing coverage** → ingest sources, write article. -7. **Format violation** → fix formatting. -8. **Filed-back insight** → update affected wiki articles. - -Run the cycle regularly. Each pass leaves the knowledge base in a better state than it found it. diff --git a/.agents/skills/kb/references/output-formats.md b/.agents/skills/kb/references/output-formats.md deleted file mode 100644 index ba50e289b..000000000 --- a/.agents/skills/kb/references/output-formats.md +++ /dev/null @@ -1,169 +0,0 @@ -# Output Format Reference - -All `inspect` and `search` commands support three output formats via `--format`. - -## Format Selection - -| Format | Flag | Use Case | -|--------|------|----------| -| table | `--format table` | Human-readable display (default) | -| json | `--format json` | Programmatic parsing by agents | -| tsv | `--format tsv` | Piping to Unix tools | - -Always use `--format json` when parsing output programmatically. - -## Inspect Output (Tabular Commands) - -Tabular inspect commands (`smells`, `dead-code`, `complexity`, `blast-radius`, `coupling`, `circular-deps`) return rows with typed columns. - -### JSON Example (`inspect complexity --top 2 --format json`) - -```json -[ - { - "symbol_name": "parseConfig", - "symbol_kind": "function", - "source_path": "src/config.ts", - "cyclomatic_complexity": 12, - "loc": 45, - "blast_radius": 8, - "smells": ["high-complexity"] - }, - { - "symbol_name": "resolveImports", - "symbol_kind": "function", - "source_path": "src/resolver.ts", - "cyclomatic_complexity": 9, - "loc": 32, - "blast_radius": 5, - "smells": [] - } -] -``` - -### TSV Example - -``` -symbol_name symbol_kind source_path cyclomatic_complexity loc blast_radius smells -parseConfig function src/config.ts 12 45 8 high-complexity -resolveImports function src/resolver.ts 9 32 5 -``` - -## Inspect Output (Detail Commands) - -Detail commands (`symbol`, `file`) return field-value pairs when a single entity matches. - -### JSON Example (`inspect symbol parseConfig --format json`) - -```json -[ - {"field": "symbol_name", "value": "parseConfig"}, - {"field": "symbol_kind", "value": "function"}, - {"field": "source_path", "value": "src/config.ts"}, - {"field": "loc", "value": 45}, - {"field": "blast_radius", "value": 8}, - {"field": "smells", "value": ["high-complexity"]}, - {"field": "outgoing_relations", "value": [ - {"target_path": "src/utils.ts", "type": "imports", "confidence": "syntactic"} - ]}, - {"field": "backlinks", "value": [ - {"target_path": "src/main.ts", "type": "calls", "confidence": "semantic"} - ]} -] -``` - -## Ingest Codebase Output - -`kb ingest codebase` always outputs JSON to stdout (no `--format` flag). - -```json -{ - "command": "generate", - "rootPath": "/path/to/repo", - "vaultPath": "/path/to/repo/.kb/vault", - "topicPath": "/path/to/repo/.kb/vault/my-project", - "topicSlug": "my-project", - "filesScanned": 120, - "filesParsed": 95, - "filesSkipped": 25, - "symbolsExtracted": 430, - "relationsEmitted": 1200, - "rawDocumentsWritten": 95, - "wikiDocumentsWritten": 12, - "indexDocumentsWritten": 5, - "timings": { - "scanMillis": 45, - "selectAdaptersMillis": 2, - "parseMillis": 1200, - "normalizeMillis": 80, - "metricsMillis": 150, - "renderMillis": 300, - "writeMillis": 200, - "totalMillis": 1977 - }, - "diagnostics": [] -} -``` - -## Search Output - -### JSON Example (`search "auth middleware" --format json`) - -```json -[ - { - "path": "raw-codebase/src/auth/middleware.md", - "score": 0.89, - "preview": "Authentication middleware that validates JWT tokens..." - } -] -``` - -## Index Output - -`kb index` always outputs JSON to stdout (no `--format` flag). - -```json -{ - "collectionName": "my-project", - "embedRequested": true, - "embedResult": { - "docsProcessed": 95, - "chunksEmbedded": 320, - "errors": 0, - "durationMs": 4500 - }, - "forceEmbed": false, - "status": { - "collection": { - "name": "my-project", - "path": "qmd://collections/my-project", - "pattern": "", - "documents": 95, - "lastUpdated": "2026-04-10T12:00:00Z" - }, - "hasVectorIndex": true, - "needsEmbedding": 0, - "totalDocuments": 95 - }, - "topicPath": "/path/to/vault/my-project", - "topicSlug": "my-project", - "updateResult": { - "collections": 1, - "indexed": 95, - "updated": 0, - "unchanged": 0, - "removed": 0, - "needsEmbedding": 95 - }, - "vaultPath": "/path/to/vault" -} -``` - -## Empty Results - -| Format | Empty Output | -|--------|-------------| -| json | `[]` | -| table | `No results.` followed by newline | -| tsv | Header row only (no data rows) | diff --git a/.agents/skills/kb/references/tooling-tips.md b/.agents/skills/kb/references/tooling-tips.md deleted file mode 100644 index b3ef85abe..000000000 --- a/.agents/skills/kb/references/tooling-tips.md +++ /dev/null @@ -1,73 +0,0 @@ -# Tooling Tips - -Companion tooling and Obsidian plugins that accelerate the Karpathy KB workflow. All are optional — the core pattern only requires markdown files. Add them as scale demands. - -## Obsidian Web Clipper (browser extension) - -Converts web articles to clean markdown with a single click, writing directly into the vault. The fastest path for getting articles from browser → `<topic>/raw/articles/`. - -- Install from the Obsidian Web Clipper page (official extension for Chrome/Firefox/Safari). -- Configure the default save location to `<topic>/raw/articles/` per topic. -- Configure a default template that includes `source_url`, `scraped`, and topic tags in frontmatter (the `kb` CLI auto-generates correct frontmatter on `kb ingest`, so this is only needed for manual clips). - -After clipping, verify the frontmatter matches `kb` conventions (the CLI auto-generates it on `kb ingest`, but manual clips need manual frontmatter). Then re-index with `kb index --topic <slug>` and append a log entry. - -## Image download and asset handling - -LLMs cannot reliably read markdown with inline images in a single pass. The workaround: download images locally so the LLM can view them separately when needed. - -**Obsidian config:** - -- Settings → Files and links → **Attachment folder path**: set to `raw/assets/` (or a per-topic attachments dir). -- Settings → Hotkeys → search "Download" → bind **"Download attachments for current file"** to a hotkey (e.g., Ctrl+Shift+D). - -**Workflow:** after clipping an article with image URLs, press the hotkey — all referenced images download to the attachment folder and the markdown is rewritten to reference local files. The LLM then reads the text first, then views specific images separately for additional context. - -## Dataview plugin - -Runs SQL-like queries over page frontmatter. Useful when the LLM adds structured frontmatter (tags, dates, `source_count`) to wiki pages — Dataview turns that into dynamic tables without maintaining a separate index. - -**Example:** list all wiki articles updated in the last 30 days, sorted by source count: - -```dataview -TABLE updated, length(sources) AS "Sources" -FROM "ai-harness/wiki/concepts" -WHERE date(updated) > date(today) - dur(30 days) -SORT length(sources) DESC -``` - -Dataview complements the static `Concept Index.md` — keep the static index for LLM navigation and add Dataview blocks inside `Dashboard.md` for live views. - -## Marp plugin - -Converts markdown files to slide decks (PDF/HTML/PPTX). A query answer, a wiki article, or a comparison can be exported to a slide deck with zero extra authoring. - -**Usage:** - -- Add `marp: true` to the frontmatter of the file being presented. -- Use `---` separators between slides. -- Export via Marp's CLI or the Obsidian Marp plugin. - -Useful for briefings in `<topic>/outputs/briefings/` that need to be shared as decks. - -## qmd vs naked index tradeoffs - -Karpathy's original pattern notes that at small scale, the static `index.md` (our `Concept Index.md` + `Source Index.md`) is sufficient — the LLM reads it to find relevant pages, then drills in. [qmd](https://github.com/tobi/qmd) (hybrid BM25 + vector search with LLM re-ranking, all local) becomes worth adding as the corpus grows. - -**Heuristic:** - -| Scale | Navigation | -|-------|-----------| -| 1-20 sources, <30 wiki articles | Concept Index + Source Index only | -| 20-50 sources, 30-80 articles | Run `kb index --topic <slug>`, still read indexes first | -| 50+ sources, 80+ articles | qmd primary, indexes become secondary browsing aids | - -The indexes never go away — they serve as the LLM's *mental model* of the topic. qmd serves as its *search tool*. - -## Graph view - -Obsidian's graph view is the fastest way to see the shape of a topic — what's central, what's orphan, what's overconnected. Run it after each lint pass to eyeball the structure. Orphan nodes in the graph corroborate orphan detection from `kb lint`. - -## The wiki is just a git repo - -No database, no server. Every commit is a reviewable checkpoint. Branch to experiment with a restructure without risking the main wiki. `git log --follow <article>.md` shows the full evolution of any concept. `git blame` shows which compile pass introduced which claim. diff --git a/.agents/skills/obsidian-cli/SKILL.md b/.agents/skills/obsidian-cli/SKILL.md deleted file mode 100644 index 0046c45ab..000000000 --- a/.agents/skills/obsidian-cli/SKILL.md +++ /dev/null @@ -1,106 +0,0 @@ ---- -name: obsidian-cli -description: Interact with Obsidian vaults using the Obsidian CLI to read, create, search, and manage notes, tasks, properties, and more. Also supports plugin and theme development with commands to reload plugins, run JavaScript, capture errors, take screenshots, and inspect the DOM. Use when the user asks to interact with their Obsidian vault, manage notes, search vault content, perform vault operations from the command line, or develop and debug Obsidian plugins and themes. ---- - -# Obsidian CLI - -Use the `obsidian` CLI to interact with a running Obsidian instance. Requires Obsidian to be open. - -## Command reference - -Run `obsidian help` to see all available commands. This is always up to date. Full docs: https://help.obsidian.md/cli - -## Syntax - -**Parameters** take a value with `=`. Quote values with spaces: - -```bash -obsidian create name="My Note" content="Hello world" -``` - -**Flags** are boolean switches with no value: - -```bash -obsidian create name="My Note" silent overwrite -``` - -For multiline content use `\n` for newline and `\t` for tab. - -## File targeting - -Many commands accept `file` or `path` to target a file. Without either, the active file is used. - -- `file=<name>` — resolves like a wikilink (name only, no path or extension needed) -- `path=<path>` — exact path from vault root, e.g. `folder/note.md` - -## Vault targeting - -Commands target the most recently focused vault by default. Use `vault=<name>` as the first parameter to target a specific vault: - -```bash -obsidian vault="My Vault" search query="test" -``` - -## Common patterns - -```bash -obsidian read file="My Note" -obsidian create name="New Note" content="# Hello" template="Template" silent -obsidian append file="My Note" content="New line" -obsidian search query="search term" limit=10 -obsidian daily:read -obsidian daily:append content="- [ ] New task" -obsidian property:set name="status" value="done" file="My Note" -obsidian tasks daily todo -obsidian tags sort=count counts -obsidian backlinks file="My Note" -``` - -Use `--copy` on any command to copy output to clipboard. Use `silent` to prevent files from opening. Use `total` on list commands to get a count. - -## Plugin development - -### Develop/test cycle - -After making code changes to a plugin or theme, follow this workflow: - -1. **Reload** the plugin to pick up changes: - ```bash - obsidian plugin:reload id=my-plugin - ``` -2. **Check for errors** — if errors appear, fix and repeat from step 1: - ```bash - obsidian dev:errors - ``` -3. **Verify visually** with a screenshot or DOM inspection: - ```bash - obsidian dev:screenshot path=screenshot.png - obsidian dev:dom selector=".workspace-leaf" text - ``` -4. **Check console output** for warnings or unexpected logs: - ```bash - obsidian dev:console level=error - ``` - -### Additional developer commands - -Run JavaScript in the app context: - -```bash -obsidian eval code="app.vault.getFiles().length" -``` - -Inspect CSS values: - -```bash -obsidian dev:css selector=".workspace-leaf" prop=background-color -``` - -Toggle mobile emulation: - -```bash -obsidian dev:mobile on -``` - -Run `obsidian help` to see additional developer commands including CDP and debugger controls. diff --git a/.agents/skills/viz/SKILL.md b/.agents/skills/viz/SKILL.md deleted file mode 100644 index abd3ca869..000000000 --- a/.agents/skills/viz/SKILL.md +++ /dev/null @@ -1,71 +0,0 @@ ---- -name: viz -description: 'Transforms content (URLs, uploaded documents, pasted text, meeting transcripts) into professional visualizations across four output modes. Accepts a mode argument or a keyword trigger in the user message. Mode "diagram" produces an Excalidraw diagram via Excalidraw:create_view. Mode "infographic" generates a Swiss Pulse PNG via the Gemini image-generation API. Mode "visualize" renders an inline Visualizer widget (SVG or HTML) via visualize:show_widget. Mode "publish" ships an interactive Swiss Pulse HTML visual to HeyGenverse via HeyGenverse:create_app and returns a shareable link. Keywords that activate the skill: "diagram it", "excalidraw this", "draw a diagram of this", "nano this", "vis it", "ver it", "hey it", "heygenverse this". Do not use for plain-text summaries, code explanations, prose responses, or generic chat visualizations without a chosen output format.' -argument-hint: '<diagram|infographic|visualize|publish> [content-or-url]' ---- - -# Viz Pack — Multi-Mode Visual Dispatcher - -Transform any content into a professional visualization via one of four output modes. This skill is a dispatcher: it resolves the active mode, loads the mode-specific workflow from `references/`, and executes it. - -## Step 1: Resolve the active mode - -Inspect the invocation in this priority order: - -1. **Positional argument.** The first positional argument is `$0`. If `$0` matches `diagram`, `infographic`, `visualize`, or `publish`, use it as the mode. Remaining arguments (`$ARGUMENTS` after the mode token) are the optional content reference (URL, file path, or inline text). - -2. **Keyword trigger.** If `$0` is empty or not a valid mode, scan the most recent user message for a trigger keyword and map it to a mode: - - | Trigger keyword(s) | Mode | - | ----------------------------------------------------------------------------- | ------------- | - | "diagram it", "excalidraw this", "draw a diagram of this" | `diagram` | - | "nano this" | `infographic` | - | "vis it" | `visualize` | - | "ver it", "hey it", "heygenverse this", "put this on heygenverse as a visual" | `publish` | - -3. **Ask the user.** If neither signal yields a mode, stop and ask which output to produce — `diagram` (Excalidraw), `infographic` (Swiss Pulse PNG via Gemini), `visualize` (inline Visualizer widget), or `publish` (interactive Swiss Pulse app on HeyGenverse). Do not guess. - -Invocation context injected at render time: mode token is `$0`, content reference is `$ARGUMENTS`. - -## Step 2: Load the mode workflow - -Read the reference file for the resolved mode and follow its workflow end-to-end: - -| Mode | Workflow file | -| ------------- | -------------------------------- | -| `diagram` | `references/mode-diagram.md` | -| `infographic` | `references/mode-infographic.md` | -| `visualize` | `references/mode-visualize.md` | -| `publish` | `references/mode-publish.md` | - -Read the file on every invocation — do not execute from memory — so any updates to the workflow are applied. - -## Step 3: Acquire the content (shared across modes) - -All four modes accept content from the same sources. Resolve the content target in this priority order: - -1. If `$ARGUMENTS` contains a URL → retrieve via `web_fetch`. If blocked (429/403), fall back to `web_search` with key phrases to get the content from search snippets. -2. If `$ARGUMENTS` points to an uploaded file → read from `/mnt/user-data/uploads/` using `pdfplumber` for PDF, `python-docx` for DOCX, `pandas` for CSV/TSV, direct read for TXT/MD/HTML. -3. If `$ARGUMENTS` is inline text → use it directly. -4. Else scan the current conversation for a referenced source (uploaded file, URL previously shared, meeting transcript at `/mnt/transcripts/`, pasted text) and use that. - -Summarize to core structure if the content exceeds 3,000 words for modes that pipe text into an LLM (`infographic`) or 5,000 words for modes that assemble visuals client-side (`diagram`, `visualize`, `publish`). - -## Step 4: Execute the loaded workflow - -Follow the steps in the loaded workflow file exactly. Do not substitute tools, colors, typography, or output formats across modes. - -## Non-Negotiable Rules - -- Never respond with a plain-text summary instead of the requested visual output. -- Never swap output format across modes (e.g., do not emit Excalidraw when `mode=publish` was requested). -- Never skip reading the mode-specific workflow file — every mode has non-obvious tool contracts and design constraints. -- Never use colors outside each mode's palette (Excalidraw: 2–3 neutrals; Swiss Pulse modes: black/white + #0066FF only). - -## Error Handling - -- **Mode unresolved** → execute Step 1.3 and ask the user. -- **Missing workflow file** → stop and report the missing path (`references/mode-<mode>.md`). -- **Missing tool integration** (`Excalidraw:*`, `visualize:*`, `HeyGenverse:*`) → report the specific missing integration and ask whether to proceed with an alternative mode. -- **Missing `GEMINI_API_KEY`** (mode `infographic` only) → stop and ask for the key before continuing. -- **Content acquisition fails** (empty, unreadable, blocked URL) → report the specific failure and request a different source. diff --git a/.agents/skills/viz/assets/swiss-pulse-tokens.md b/.agents/skills/viz/assets/swiss-pulse-tokens.md deleted file mode 100644 index e0fc8bb54..000000000 --- a/.agents/skills/viz/assets/swiss-pulse-tokens.md +++ /dev/null @@ -1,65 +0,0 @@ -# Swiss Pulse Design Tokens - -Canonical design system shared by mode `infographic` and mode `publish`. Every Swiss Pulse visual must draw from these tokens exactly — no color, weight, or size outside this file is permitted. - -## Color Palette - -### Light mode - -| Role | Value | -|---|---| -| Background | `#ffffff` | -| Surface | `#f5f5f0` | -| Text primary | `#1a1a1a` | -| Text secondary | `#6b6b65` | -| Text tertiary | `#9c9c94` | -| Accent | `#0066FF` | -| Accent background | `#e8f0fe` | -| Borders | `rgba(0,0,0,0.1)` | - -### Dark mode - -| Role | Value | -|---|---| -| Background | `#1a1a1a` | -| Surface | `#252523` | -| Text primary | `#e8e8e0` | -| Text secondary | `#a0a098` | -| Text tertiary | `#73726c` | -| Accent | `#4d94ff` | -| Accent background | `#1a2a44` | -| Borders | `rgba(255,255,255,0.1)` | - -Exactly three color families are permitted: black/neutral, white/off-white, and electric blue (#0066FF light / #4d94ff dark). Any other hue is forbidden. - -## Typography - -- **Font stack:** `-apple-system, BlinkMacSystemFont, 'Segoe UI', 'Helvetica Neue', Arial, sans-serif` -- **Hero number:** 42–48px, weight 600 -- **Section labels:** 11px, uppercase, letter-spacing 0.08em, weight 600 -- **Card titles:** 13–15px, weight 600 -- **Body / descriptions:** 11–13px, weight 400 -- **Allowed weights:** 400 (regular) and 600 (bold) only. Never 700 or heavier. - -For the Gemini infographic prompt, map the font stack to "Helvetica or Swiss grotesque style only — clean, bold headings, light body text." - -## Layout - -- **Max width:** 720px, centered -- **Page padding:** 32px 24px -- **Grid:** CSS Grid with 10–12px gaps -- **Section whitespace:** 2–2.5rem between major sections -- **Border-radius:** 8px for elements, 12px for cards -- **Border width:** 0.5px (subtle separation) -- **Dark mode:** always support via `@media (prefers-color-scheme: dark)` -- **Mobile breakpoint:** `@media (max-width: 500px)` — stack grids into single columns - -## Forbidden - -- Any color outside the palette above (including red, green, yellow, purple, or custom brand hues) -- Gradients -- Box shadows / drop shadows -- Decorative borders, flourishes, icons-as-decoration -- Font weights above 600 -- Font sizes outside the scale above -- Decorative illustrations that do not encode data diff --git a/.agents/skills/viz/references/mode-diagram.md b/.agents/skills/viz/references/mode-diagram.md deleted file mode 100644 index ffc2f3dd4..000000000 --- a/.agents/skills/viz/references/mode-diagram.md +++ /dev/null @@ -1,70 +0,0 @@ -# Mode: `diagram` — Excalidraw Content Diagrammer - -Turn the acquired content into a clear, well-structured Excalidraw diagram rendered inline in chat. - -Prerequisite: the dispatcher (`SKILL.md` Step 3) has already acquired the content. This file covers diagram selection, the Excalidraw format contract, and layout rules. - -## Step 1: Analyze and choose diagram type - -Read the content and identify what kind of structure it conveys. Then pick the best diagram type: - -| Content pattern | Diagram type | -|---|---| -| Steps, process, workflow, sequence | **Flowchart** — boxes connected by arrows in a clear flow direction | -| System, architecture, components, layers | **Structural diagram** — nested containers with labeled regions | -| Timeline, phases, chronological progression | **Timeline** — horizontal or vertical sequence with milestones | -| Hierarchy, org chart, taxonomy | **Tree** — parent-child relationships branching outward | -| Comparison, tradeoffs, axes | **2×2 matrix** or **comparison grid** | -| Relationships, dependencies, network | **Network/graph** — nodes and edges showing connections | -| Mental model, framework, concept map | **Concept map** — central idea with branching related concepts | -| Decisions, branching logic | **Decision tree** — diamond decisions with yes/no paths | - -When in doubt, default to a **flowchart** for sequential content or a **structural diagram** for systems. - -## Step 2: Read the Excalidraw format reference - -Before creating the diagram, always call: - -``` -Excalidraw:read_me -``` - -This returns the element format, color palettes, and examples. Follow it exactly. - -## Step 3: Build the Excalidraw diagram - -Create the diagram using `Excalidraw:create_view` with these principles. - -### Layout rules - -- **Flow direction:** top-to-bottom for processes, left-to-right for timelines. -- **Spacing:** minimum 60px between elements, 80px+ between rows/columns. -- **Alignment:** keep elements on a grid — consistent x-coordinates for columns, consistent y-coordinates for rows. -- **Grouping:** cluster related elements visually with clear whitespace separating groups. - -### Content rules - -- **Extract the structure, not the prose.** A 2000-word article becomes 8–12 nodes, not 30. -- **Box labels:** 2–5 words max. If more is needed, split into two boxes. -- **One idea per box.** Do not merge distinct concepts. -- **Arrows have meaning.** Every arrow represents a relationship — flow, dependency, causation, sequence. Do not add arrows for decoration. -- **Use color sparingly.** 2–3 colors max to encode categories. Gray/neutral for structural elements. -- **Include key numbers.** If the content has important metrics, put them in the diagram (e.g., "92% adoption", "$159M ARR", "5 months"). - -### Element hierarchy - -1. **Primary nodes** — the main concepts/steps (larger boxes, bolder colors) -2. **Secondary nodes** — supporting details or sub-steps (smaller boxes, muted colors) -3. **Connectors** — arrows showing relationships between nodes -4. **Labels** — text annotations on arrows or near groups (use sparingly) -5. **Containers** — dashed rectangles grouping related nodes (for structural diagrams) - -## Step 4: Respond - -Output the Excalidraw diagram inline. Keep commentary minimal — the diagram speaks for itself. Add a one-line note on what the diagram shows if needed. - -## Mode-Specific Rules - -- Always call `Excalidraw:read_me` before creating the diagram. -- Do not overcrowd. If the content is very complex, focus on top-level structure and offer to drill into sub-sections. -- Do not add decorative elements — every element earns its place. diff --git a/.agents/skills/viz/references/mode-infographic.md b/.agents/skills/viz/references/mode-infographic.md deleted file mode 100644 index 430560997..000000000 --- a/.agents/skills/viz/references/mode-infographic.md +++ /dev/null @@ -1,91 +0,0 @@ -# Mode: `infographic` — Swiss Pulse PNG via Gemini - -Generate a professional infographic **image** from the acquired document using Gemini's image-generation API, following Swiss International Typographic Style. - -Prerequisite: the dispatcher (`SKILL.md` Step 3) has already extracted the document text. This file covers the Gemini API call and the Swiss Pulse design brief. - -## Step 1: Load design tokens - -Read `assets/swiss-pulse-tokens.md` to retrieve the canonical palette (black, white, electric blue #0066FF), typography (Helvetica / Swiss grotesque), and layout rules referenced by the Gemini prompt below. - -## Step 2: Verify environment - -Confirm the `GEMINI_API_KEY` environment variable is set. If missing, stop and request it from the user. Do not hardcode the key in the script. - -## Step 3: Call Gemini to generate the infographic - -Run this Python script, substituting the extracted document text into `document_text`: - -```python -import os -import requests -import json -import base64 - -GEMINI_API_KEY = os.environ["GEMINI_API_KEY"] -GEMINI_URL = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key={GEMINI_API_KEY}" - -document_text = """<PASTE EXTRACTED TEXT HERE>""" - -design_prompt = f"""Create a professional infographic image in the Swiss International Typographic Style (inspired by Josef Müller-Brockmann). - -DESIGN RULES — follow these strictly: -- Grid-locked layout: everything aligned to a strict modular grid -- Color palette: black, white, and ONE accent color: electric blue (#0066FF). No other colors. -- Typography: Helvetica or Swiss grotesque style only — clean, bold headings, light body text -- Lead with a large hero metric or key number displayed prominently at the top -- Use clean data visualizations: bar charts, line charts, or donut charts where appropriate -- Generous whitespace — never cramped -- No decorative elements — every element earns its place -- Subtle diagonal composition elements for dynamism -- Professional, clinical, precise aesthetic - -CONTENT TO VISUALIZE: -{document_text} - -Create a complete, polished infographic that communicates the key information from this content. Include a clear title, key metrics displayed prominently, supporting data points, and a bottom-line takeaway. Make it look like it was designed by a Swiss design studio.""" - -payload = { - "contents": [{"parts": [{"text": design_prompt}]}], - "generationConfig": { - "responseModalities": ["IMAGE", "TEXT"], - "temperature": 1.0, - "maxOutputTokens": 8192 - } -} - -response = requests.post(GEMINI_URL, json=payload, headers={"Content-Type": "application/json"}) -result = response.json() - -# Extract and save the image -for part in result["candidates"][0]["content"]["parts"]: - if "inlineData" in part: - image_data = part["inlineData"]["data"] - mime_type = part["inlineData"]["mimeType"] - ext = "png" if "png" in mime_type else "jpg" if "jpeg" in mime_type else "webp" - output_path = f"/mnt/user-data/outputs/infographic.{ext}" - with open(output_path, "wb") as f: - f.write(base64.b64decode(image_data)) - print(f"Saved to {output_path}") - break -else: - print("No image in response. Full response:") - print(json.dumps(result, indent=2)) -``` - -## Step 4: Present the output - -Use `present_files` to share the generated infographic. Keep the response minimal — the visual speaks for itself. - -## Error Handling - -- If Gemini returns no image → retry once with a simplified prompt. -- If the document is empty → report the empty source to the user. -- If the API call fails → print the error for debugging and report to the user. -- Always check that `result["candidates"]` exists before accessing it. - -## Mode-Specific Rules - -- Always produce an image file (PNG/JPG/WEBP). Never substitute HTML, Excalidraw, or a text summary. -- Never embed colors outside the Swiss Pulse palette defined in `assets/swiss-pulse-tokens.md`. -- Never add commentary beyond a single sentence when presenting the image. diff --git a/.agents/skills/viz/references/mode-publish.md b/.agents/skills/viz/references/mode-publish.md deleted file mode 100644 index 1a2aa437a..000000000 --- a/.agents/skills/viz/references/mode-publish.md +++ /dev/null @@ -1,154 +0,0 @@ -# Mode: `publish` — HeyGenverse Swiss Pulse Visual Publisher - -Turn the acquired content into an interactive, Swiss Pulse–styled visual explainer and publish it to HeyGenverse with a shareable link. - -Prerequisite: the dispatcher (`SKILL.md` Step 3) has already acquired the content. This file covers content structuring, HTML assembly, Chart.js integration, and the HeyGenverse publish contract. - -## Step 1: Load design tokens - -Read `assets/swiss-pulse-tokens.md` to retrieve the canonical palette (light + dark mode), typography scale, layout grid, and border/radius values used throughout the HTML below. - -## Step 2: Analyze and structure the content - -Extract the following elements from the content. Not all will be present — use what is available: - -1. **Hero metric** — the single most striking number or statistic (e.g., "2 → 500+", "$159M ARR", "73% non-activation") -2. **Supporting stats** — 3–4 secondary metrics that frame the story -3. **Timeline or sequence** — any chronological progression, phases, or steps -4. **Architecture or structure** — systems, frameworks, hierarchies, or relationships -5. **Key examples** — concrete instances that illustrate the main points -6. **Core insight or mental model** — the "so what" — the one-line takeaway -7. **Quotable moment** — a memorable quote from the source if available - -Organize these into a visual hierarchy: the hero metric leads, the structure follows, details support. - -## Step 3: Build the self-contained HTML - -Assemble a single self-contained HTML page using the Swiss Pulse tokens from Step 1. Every visual must include the required components below. - -### Required components (all mandatory) - -1. **Hero section** — large metric + one-line context -2. **Stats row** — 3–4 metric cards in a grid -3. **At least ONE chart or graph** — pick the most appropriate: - - **Bar chart** (Chart.js) — for comparisons, rankings, or distributions - - **Line chart** (Chart.js) — for trends over time or growth curves - - **Timeline** — for chronological progressions (interactive, clickable phases) - - **Funnel** — for conversion or drop-off sequences - - **Architecture diagram** — for systems, hierarchies, or relationships (use styled HTML divs, not SVG) - - **Progress/gauge** — for completion or achievement metrics -4. **Content cards** — key examples, insights, or categories in a grid -5. **Source attribution** — footer with source link - -### Chart implementation rules - -Load Chart.js from CDN: - -```html -<script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.js"></script> -``` - -Chart.js configuration: - -- Canvas **cannot** resolve CSS variables — use hardcoded hex values from `assets/swiss-pulse-tokens.md`. -- Detect dark mode at runtime: `const isDark = matchMedia('(prefers-color-scheme: dark)').matches;` -- Use accent blue (`#0066FF` light / `#4d94ff` dark) as the primary data color. -- Use gray (`#9c9c94`) for secondary data or grid lines. -- Transparent background, subtle grid lines. -- Always wrap the canvas in a div with explicit height and `position: relative`. -- Set `responsive: true, maintainAspectRatio: false`. -- Build a custom HTML legend (not the Chart.js default). -- Disable the default legend: `plugins: { legend: { display: false } }`. - -For timelines, use interactive HTML — not Chart.js: - -- Horizontal row of phase items with a left-border accent on the active phase. -- Click to swap detail panel content. -- Active state uses accent-blue background + border. - -### Interactive elements - -Add interactivity where the content supports it: - -- **Clickable timeline phases** that swap a detail panel -- **Hover states** on cards (background shift, border emphasis) -- **Responsive grid** that stacks on mobile (`@media max-width: 500px`) - -### Template skeleton - -```html -<!DOCTYPE html> -<html lang="en"> -<head> -<meta charset="UTF-8"> -<meta name="viewport" content="width=device-width, initial-scale=1.0"> -<title>[TITLE] - - - - - - - - - - - - - -``` - -## Step 4: Publish to HeyGenverse - -Call the `HeyGenverse:create_app` tool: - -``` -Tool: HeyGenverse:create_app -Parameters: - title: [Descriptive title derived from the content] - description: [1–2 sentence summary of what this visual explains] - html: [The complete HTML from Step 3] - tags: ["visual", "swiss-pulse", ...topic-specific tags] -``` - -## Step 5: Return the shareable link - -Format the link as: `https://www.heygenverse.com/a/{app-id}` - -Never use the `/api/apps/serve?id=` URL for sharing. Always use the `/a/` format. - -Keep the response minimal: - -- The HeyGenverse link -- One sentence on what the visual covers -- A note on any interactive elements to try - -## Quality Checklist - -Before publishing, verify: - -- [ ] Hero metric is prominent and impactful -- [ ] At least one chart/graph is present (bar, line, timeline, funnel, or architecture) -- [ ] Dark mode works (all colors use CSS variables or have dark-mode overrides) -- [ ] Responsive — stacks cleanly on mobile -- [ ] Swiss Pulse aesthetic — B&W + #0066FF only, no decoration, grid-locked -- [ ] Source attribution in footer -- [ ] Interactive elements have hover/active states -- [ ] Title is descriptive, not generic - -## Mode-Specific Rules - -- Always publish to HeyGenverse. Never create a local file without publishing. -- Never use colors outside the B&W + #0066FF palette defined in `assets/swiss-pulse-tokens.md`. -- Never use gradients, shadows, or decorative elements. -- Never skip the chart/graph — every visual needs at least one. -- Never use the Visualizer or Excalidraw — those belong to other modes. -- Never forget dark-mode support. diff --git a/.agents/skills/viz/references/mode-visualize.md b/.agents/skills/viz/references/mode-visualize.md deleted file mode 100644 index 86aa9a5d9..000000000 --- a/.agents/skills/viz/references/mode-visualize.md +++ /dev/null @@ -1,71 +0,0 @@ -# Mode: `visualize` — Inline Visualizer Widget - -Turn the acquired content into the best-fit visual for understanding it, rendered inline in chat via the Visualizer. - -Prerequisite: the dispatcher (`SKILL.md` Step 3) has already acquired the content. This file covers format selection, the Visualizer tool contract, and design rules. - -## Step 1: Analyze and choose the visual format - -The Visualizer supports SVG and HTML. Pick the format that fits the content: - -| Content pattern | Visual format | -|---|---| -| Process, workflow, sequence, steps | **Flowchart** (SVG) — boxes and arrows showing flow | -| System, architecture, components | **Structural diagram** (SVG) — nested containers with labeled regions | -| Timeline, phases, chronological growth | **Interactive timeline** (HTML) — clickable phases with detail panel | -| Comparison, options, tradeoffs | **Card grid** (HTML) — side-by-side cards with key differences highlighted | -| Data, metrics, trends | **Chart + stats** (HTML with Chart.js) — metric cards + bar/line chart | -| Pipeline, funnel, conversion | **Kanban/pipeline** (HTML) — multi-column cards with status badges | -| Concept, mental model, framework | **Explainer** (HTML) — hero insight + supporting structure + interactive elements | -| Mixed / complex | **Dashboard-style** (HTML) — hero metric + stats row + chart + content cards | - -When in doubt, default to the **dashboard-style** layout — it handles most content well. - -## Step 2: Load the Visualizer module - -Before creating any visual, always call: - -``` -visualize:read_me -``` - -Load the appropriate module(s): `diagram` for SVG flowcharts/structural, `interactive` for HTML explainers, `chart` for Chart.js data viz. - -## Step 3: Build the visual - -Create the visual using `visualize:show_widget` and follow every Visualizer design rule. - -### Core principles - -- **Seamless** — should feel like a natural extension of the chat -- **Flat** — no gradients, shadows, or decorative effects -- **Compact** — show the essential inline, explain the rest in text -- **Text in response, visuals in the tool** — all explanatory prose goes OUTSIDE the tool call - -### Content extraction rules - -- **Extract structure, not prose.** Distill to key metrics, relationships, sequences, and takeaways. -- **Hero metric first.** Lead with the single most striking number or insight. -- **3–4 supporting stats.** Frame the story with secondary metrics. -- **Include interactivity** where the content supports it — clickable elements, hover states, `sendPrompt()` for drill-downs. - -### Design tokens - -- Use CSS variables for all colors (auto dark mode) -- Font sizes: 11–16px range only -- Two weights: 400 regular, 500 bold -- Borders: `0.5px solid var(--color-border-tertiary)` -- Border-radius: `var(--border-radius-md)` for elements, `var(--border-radius-lg)` for cards -- Load Chart.js from CDN: `https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.js` - -## Step 4: Respond - -Output the visual inline via `show_widget`. Add brief commentary connecting the visual to the user's context. Keep it to 2–3 sentences max. - -## Mode-Specific Rules - -- Always call `visualize:read_me` before creating the widget. -- Never put explanatory prose inside the HTML — text goes outside the tool call. -- Never generate image files — that belongs to mode `infographic`. -- Never publish — that belongs to mode `publish`. -- Never use Excalidraw — that belongs to mode `diagram`. diff --git a/.audits/architectural-analysis-2026-04-18-agent-capabilities.md b/.audits/architectural-analysis-2026-04-18-agent-capabilities.md deleted file mode 100644 index 793e2ff89..000000000 --- a/.audits/architectural-analysis-2026-04-18-agent-capabilities.md +++ /dev/null @@ -1,187 +0,0 @@ -# Architectural Analysis Report - -**Date**: 2026-04-18 -**Scope**: Agent-declared capabilities across AGENT.md, config parsing, and AGH Network peer advertising - ---- - -## Executive Summary - -There is a real architectural gap between the network protocol, the agent-definition RFC, and the current runtime: - -- `docs/rfcs/003_agh-network-v0.md` requires `Peer Card.capabilities` and treats them as the discovery surface for `greet`/`whois`. -- `docs/rfcs/001_agent-md-with-skills-memory.md` says an `AGENT.md` definition could generate an Agent Card / discovery metadata, but it never defines the field shape for capabilities. -- `internal/config/agent.go` has no capabilities field in `AgentDef`. -- `internal/network` currently creates local peer cards with `Capabilities: []string{}` by default. - -The result is not just a documentation omission. It is a broken contract chain: - -1. the protocol expects advertised capabilities -2. the authoring format cannot declare them -3. the runtime has no canonical source to project into `PeerCard` - -This is a root-cause design issue, not a small config bug. - ---- - -## Findings - -### 1. Protocol requires capability advertisement, but the runtime has no source of truth - -**Evidence** - -- `docs/rfcs/003_agh-network-v0.md:277-285` defines `Peer Card.capabilities` as a required field. -- `docs/rfcs/003_agh-network-v0.md:456-475` makes `greet` and `whois` the capability-advertising and lookup path. -- `internal/network/peer.go:102-107` builds the default local card with empty `Capabilities`. -- `internal/network/manager.go:416-420` joins a local peer using only `DefaultPeerCard(request.peerID)`. - -**Impact** - -- Local peers can join the network without any declared capabilities. -- Discovery exists structurally, but carries no useful agent-specific capability data. -- Adding capability strings elsewhere later would create drift unless the projection path is defined. - -**Severity**: HIGH - -### 2. AGENT.md RFC implies derivation to discovery metadata, but never specifies the mapping - -**Evidence** - -- `docs/rfcs/001_agent-md-with-skills-memory.md:316-318` says Agent Cards could be generated from `AGENT.md`. -- `docs/rfcs/001_agent-md-with-skills-memory.md:439` explicitly raises the open question of whether provider-specific settings should evolve into a capabilities model. - -**Impact** - -- The intended architecture already points to `AGENT.md -> discovery card`, but the spec stops before defining how. -- That missing mapping is the reason `internal/config/agent.go` never got a capabilities shape. - -**Severity**: HIGH - -### 3. Agent definition parsing is shape-closed today - -**Evidence** - -- `internal/config/agent.go:17-26` and `internal/config/agent.go:29-37` define the full parsed/frontmatter shape. -- `internal/config/agent.go:210-218` copies only existing fields into `AgentDef`. -- Current fields cover provider/model/tools/permissions/mcp_servers/hooks/prompt only. - -**Impact** - -- There is no extension slot for network-facing capability metadata. -- Any capability proposal affects parsing, validation, cloning, resource codecs, and API payloads. - -**Severity**: HIGH - -### 4. Capability semantics are advisory in the protocol, but runtime capabilities elsewhere are often enforcement-oriented - -**Evidence** - -- `docs/rfcs/003_agh-network-v0.md:301-309` says capability strings are opaque and implementation-defined. -- `docs/rfcs/003_agh-network-v0.md:840` says advertised capabilities are advisory until behavior is verified. -- Extension/runtime capability systems in `internal/extension/capability.go` are enforcement-oriented and security-sensitive. - -**Impact** - -- Reusing one undifferentiated `capabilities` field across all AGH subsystems would mix two different semantics: - - advisory discovery claims for the network - - enforced authorization grants for runtime/host APIs -- That would be a modeling mistake and likely create future confusion. - -**Severity**: HIGH - -### 5. Existing cloned/serialized agent surfaces would drift if the field is added informally - -**Evidence** - -- `internal/workspace/clone.go:145-163` deep-copies `AgentDef` fields explicitly. -- `internal/config/agent_resource.go:25-44` normalizes `AgentDef` for resource sync. -- `internal/api/core/conversions.go:118-145` serializes `AgentDef` into API payloads. -- `internal/api/contract/contract.go:88-96` exposes an API `AgentPayload` that currently omits capability metadata. - -**Impact** - -- Any new field must be propagated deliberately across config, clone, resources, and API. -- Otherwise different runtime surfaces will disagree on the same agent. - -**Severity**: MEDIUM - ---- - -## Architectural Risks - -### Risk 1: Single flat `capabilities: []string` becomes overloaded - -If one flat list is used to mean: - -- network discovery claims -- tool/runtime authorization -- provider feature support -- hooks/resources availability - -then AGH will accumulate one ambiguous field with incompatible semantics. - -### Risk 2: Capabilities become config-only decoration - -If the field is added to `AGENT.md` without defining: - -- normalization rules -- validation rules -- projection into `PeerCard` -- projection or exclusion from API/resource surfaces - -then the system will still not solve the actual integration gap. - -### Risk 3: Provider-specific behavior gets hidden inside opaque capability strings - -If provider/runtime launch features like `permissions`, ACP modes, or model support are moved wholesale into generic capability strings, the spec may lose clarity rather than gain portability. - ---- - -## Recommended Direction - -### Recommendation 1: Separate declared agent capabilities from enforced runtime grants - -Use agent-declared capabilities as **authoring/discovery metadata**, not as the security boundary. - -That keeps alignment with RFC 003, which treats peer capabilities as advisory. - -### Recommendation 2: Define a projection contract, not just a frontmatter field - -The spec should answer: - -1. Where capabilities are declared in `AGENT.md` -2. How they are normalized and validated -3. Which subset is projected into `network.PeerCard` -4. Which parts, if any, appear in API/resource payloads -5. Which runtime layer owns the derivation - -### Recommendation 3: Prefer a structured capability block over a single overloaded list - -The analysis suggests a structure closer to: - -```yaml -capabilities: - declare: - - workspace.patch.apply - - artifact.recipe.consume - network: - advertise: - - workspace.patch.apply - - artifact.recipe.consume -``` - -or an equivalent minimal variant with an explicit projection rule. - -This keeps room for future non-network capability metadata without collapsing everything into one ambiguous flat list. - ---- - -## Conclusion - -The gap is architectural and spans three layers: - -- **RFC 003** already requires capability signaling. -- **RFC 001** already hints that AGENT.md should generate discovery metadata. -- **Current code** has no field and no projection path, so peer cards default to empty capabilities. - -The correct fix is to define a capability declaration model in the agent spec together with a daemon-owned derivation path into `PeerCard`. The wrong fix would be to add an unscoped `[]string` field with no semantics beyond "maybe used later". diff --git a/docs/plans/2026-04-06-workspace-entity-design.md b/.codex/plans/2026-04-06-workspace-entity-design.md similarity index 100% rename from docs/plans/2026-04-06-workspace-entity-design.md rename to .codex/plans/2026-04-06-workspace-entity-design.md diff --git a/docs/plans/2026-04-08-agh-network-design.md b/.codex/plans/2026-04-08-agh-network-design.md similarity index 100% rename from docs/plans/2026-04-08-agh-network-design.md rename to .codex/plans/2026-04-08-agh-network-design.md diff --git a/docs/plans/2026-04-08-rfc-examples-design.md b/.codex/plans/2026-04-08-rfc-examples-design.md similarity index 100% rename from docs/plans/2026-04-08-rfc-examples-design.md rename to .codex/plans/2026-04-08-rfc-examples-design.md diff --git a/docs/plans/2026-04-10-automation-techspec-design.md b/.codex/plans/2026-04-10-automation-techspec-design.md similarity index 100% rename from docs/plans/2026-04-10-automation-techspec-design.md rename to .codex/plans/2026-04-10-automation-techspec-design.md diff --git a/docs/plans/2026-04-15-bridge-adapters-design.md b/.codex/plans/2026-04-15-bridge-adapters-design.md similarity index 100% rename from docs/plans/2026-04-15-bridge-adapters-design.md rename to .codex/plans/2026-04-15-bridge-adapters-design.md diff --git a/.codex/qa/release-hermes/qa/evidence/commands.md b/.codex/qa/release-hermes/qa/evidence/commands.md deleted file mode 100644 index af45fb8c4..000000000 --- a/.codex/qa/release-hermes/qa/evidence/commands.md +++ /dev/null @@ -1,26 +0,0 @@ -# Release QA Command Evidence - -**Date:** 2026-04-24 - -| Command | Status | Notes | -| ---------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ---- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -| `go test ./internal/network` | Pass | Validated network unit suite after retry-backoff fix | -| `make deps` | Pass | `go mod tidy` normalized existing direct/indirect dependency classification and removed stale sums | -| Initial `make verify` | Pass | Web format/lint/typecheck/test/build passed; Go lint reported 0 issues; Go race suite completed 5704 tests; boundary check passed | -| Initial `make test-integration` | Fail | Exposed stale integration contracts and runtime failures; retained here as evidence that QA found real stale contracts before remediation | -| `go test -race -tags integration ./internal/api/httpapi -run 'TestHTTPPromptPersistsTerminalEventsAfterClientDisconnect | TestHTTPTransportSessionProviderLifecycle | TestHTTPFullRoundTripWithRealSessionManager | TestHTTPSessionStreamReconnectsWithLastEventID | TestHTTPSessionStopReasonPropagatesToGlobalDBAndAPI | TestHTTPSessionChannelRoundTrip' -count=1 -v` | Pass | Validated HTTP prompt disconnect persistence, stop route parity, session streaming, stop reason persistence, and channel round trip after remediation | -| `go test -race -tags integration ./internal/api/udsapi -run 'TestUDSTransportResumeMissingProviderReturnsExplicitBadRequest | TestUDSFullRoundTripWithRealSessionManager | TestUDSSessionStreamReconnectsWithLastEventID | TestUDSSessionChannelRoundTrip' -count=1 -v` | Pass | Validated UDS stop route parity, provider resume error, session streaming, and channel round trip after remediation | -| `go test ./internal/config -run TestResolveSessionAgent -count=1 -v` | Pass | Validated provider resolution preserves custom agent command/model when persisted provider matches effective provider and still switches runtime fields for a real override | -| `go test ./internal/session -run 'TestCreateWithProviderOverridePropagatesToSessionRuntime | TestResumeMissingACPStateFallbackPreservesRecoveredCrashClassification | TestResumeFallsBackToFreshStartWhenStoredACPSessionIsMissing' -count=1 -v` | Pass | Validated provider override behavior and ACP missing-state fallback after crash recovery | -| `go test -race -tags integration ./internal/session -run TestManagerIntegrationResumeClassifiesCrashAndActivates -count=1 -v` | Pass | Validated real ACP resume preserves recovered `agent_crashed` classification | -| `go test -race -tags integration ./internal/daemon -run TestDaemonNightlyE2EAutomationTaskResumesIntoNetworkChannel -count=1 -v` | Pass | Validated automation task resume into network channel, helper command preservation on resume, audited send, transcript visibility, and toolhost side effect | -| `go test -race -tags integration ./internal/testutil/e2e -run TestStartRuntimeHarnessRetriesHTTPPortConflicts -count=1 -v` | Pass | Validated runtime harness reports process-exit bind failures accurately for retry/reseed | -| `make test-integration` | Pass | Full integration suite passed: 6186 tests, 3 skipped in 68.903s; skips require `DAYTONA_API_KEY` | -| `make test-e2e-runtime` | Pass | Runtime E2E passed across daemon, HTTP API, UDS API, and shared e2e harness packages | -| `bun run test:e2e:daemon-served:raw e2e/session-onboarding.spec.ts` | Pass | Targeted browser session onboarding, prompt, approval API, reload continuity, and stop/resume controls | -| `bun run test:e2e:daemon-served:raw e2e/tasks.spec.ts` | Pass | Targeted browser Tasks flow after stabilizing active-run route navigation | -| `make test-e2e-web` | Pass | Full daemon-served Playwright suite passed: 15/15 specs in 46.6s, including Network browser route | -| Final `make verify` | Pass | Web format/lint/typecheck/unit/build passed; web unit suite 189 files / 1401 tests; Go lint 0 issues; Go race suite DONE 5707 tests in 33.904s; boundary check passed | -| LLM capability check | Pass | `OPENAI_API_KEY=present`; `codex` CLI 0.124.0 available; `claude` CLI 2.1.111 available; Gemini unavailable | -| `codex exec --ephemeral --skip-git-repo-check --sandbox read-only -C /tmp --json 'Do not run shell commands. Reply exactly: AGH-LLM-SMOKE-OK'` | Pass | Real OpenAI-backed Codex smoke returned `AGH-LLM-SMOKE-OK` | -| Real AGH + Codex ACP + Network smoke | Pass with caveat | Isolated `AGH_HOME` daemon started with network running; Codex ACP session created; normal prompt returned `AGH_REAL_NETWORK_OK`; network `direct` to isolated Codex session reached `messages_delivered=1`. Caveat: real agent followed network safety guidance/agentic behavior instead of returning only the token over network; daemon stopped cleanly. | diff --git a/.codex/qa/release-hermes/qa/evidence/hermes-comparison.md b/.codex/qa/release-hermes/qa/evidence/hermes-comparison.md deleted file mode 100644 index c8744e864..000000000 --- a/.codex/qa/release-hermes/qa/evidence/hermes-comparison.md +++ /dev/null @@ -1,58 +0,0 @@ -# Hermes vs AGH Production-Grade Comparison - -**Date:** 2026-04-24 -**Purpose:** Identify release-critical AGH gaps by comparing with `.resources/hermes`. - -## Production-grade traits found in Hermes - -1. **Persistent background work registry** - - Hermes tracks long-running/background processes with checkpoint recovery, output buffers, completion queues, watcher metadata, and crash recovery. - - Relevant source: `.resources/hermes/tools/process_registry.py`, `.resources/hermes/gateway/run.py`. - -2. **Inactivity-aware runtime supervision** - - Hermes distinguishes active work from idle agents, drains running agents during shutdown/restart, and queues follow-up input instead of interrupting active work. - - Relevant source: `.resources/hermes/gateway/run.py`, release notes for inactivity timeouts and shutdown drain. - -3. **Retry/backoff hardening** - - Hermes avoids tight retry loops in gateway reconnects/API operations and uses retry windows/backoff for transient failures. - - Relevant source: `.resources/hermes/gateway/run.py`, `.resources/hermes/hermes_state.py`, release notes. - -4. **Durable state and observability** - - Hermes uses SQLite WAL, write retry, checkpoints, message/session persistence, safe query fallback, and operator-visible status. - - Relevant source: `.resources/hermes/hermes_state.py`, `.resources/hermes/gateway/status.py`. - -5. **Message/session safety** - - Hermes has deterministic session keys, reset safeguards, platform-specific redaction, duplicate-delivery prevention and bounded caches. - - Relevant source: `.resources/hermes/gateway/session.py`, `.resources/hermes/gateway/run.py`. - -## AGH coverage and actions - -| Production need | AGH coverage | Release disposition | -| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- | -| Persistent async work and owner reentry | Detached harness metadata, task runtime state, reentry bridge, synthetic prompts. | Covered by existing runtime design; validate through integration/e2e. | -| Network routing and delivery | `internal/network` router, transport, peer registry, delivery coordinator, audit writer. | Covered; expanded delivery retry hardening. | -| Retry/backoff for transient failures | NATS reconnects, bridge retry policies, extension restart backoff existed; inbound network delivery was immediate retry. | Fixed in this session: scheduled exponential capped retry after failed `PromptNetwork`. | -| Durable audit/timeline | Global DB network audit and timeline stores, JSONL audit sink, duplicate message-id ignore. | Covered; validate through store/API/e2e. | -| Web/operator visibility | Network status/channel/peer/timeline UI and e2e selectors/artifacts exist. | Covered; validate through web e2e/browser. | -| Real provider confidence | ACP-compatible provider execution depends on local binaries/credentials. | Validate if available; otherwise record blocker and rely on runtime/e2e harness. | - -## Code change made - -- `internal/network/delivery.go` - - Added retry attempt tracking per queued envelope. - - Added exponential retry delay capped at 5 seconds. - - Added scheduled retry after worker exit, instead of immediate worker restart. - - Added retry attempt logging. - -- `internal/network/delivery_test.go` - - Changed prompt-failure regression to prove retry is scheduled and not executed until the scheduler fires. - - Added retry delay cap coverage. - -## Current result - -- `go test ./internal/network` passes. -- Full `make test-integration` passes. -- Full runtime E2E passes. -- Full daemon-served web/browser E2E passes, including Network route. -- Final `make verify` passes. -- Real Codex/OpenAI smoke passes, including AGH daemon + Codex ACP + network direct delivery evidence. diff --git a/.codex/qa/release-hermes/qa/reports/release-qa-report.md b/.codex/qa/release-hermes/qa/reports/release-qa-report.md deleted file mode 100644 index ee328cff5..000000000 --- a/.codex/qa/release-hermes/qa/reports/release-qa-report.md +++ /dev/null @@ -1,56 +0,0 @@ -# AGH Release QA Report - -**Date:** 2026-04-24 -**Scope:** Hermes production-grade comparison, AGH release hardening, full verification, network-focused QA, browser E2E, and real LLM smoke. -**Result:** Pass with one documented LLM-behavior caveat. - -## Executive Summary - -AGH is release-ready from the tested surfaces. The Hermes comparison identified one release-critical reliability gap in AGH: inbound network delivery retried immediately after `PromptNetwork` failure, which could create a tight loop and hide transient delivery failures. The implementation now schedules exponential capped retry after worker exit, preserves queued delivery state, and has deterministic regression tests. - -QA also found and fixed stale integration contracts and runtime harness weaknesses that would have reduced release confidence: prompt disconnect draining, stop route parity, provider override resume semantics, recovered crash classification, runtime harness readiness, UDS socket collisions, and stale Playwright selectors/routes. - -## Production Hardening Applied - -- Added scheduled exponential retry/backoff for failed inbound network delivery. -- Preserved network queued-envelope attempt state and retry logging. -- Fixed HTTP/UDS prompt disconnect handling so client disconnects return immediately while agent turns drain to terminal persistence. -- Fixed HTTP/UDS transport parity tests to use the current stop route. -- Preserved recovered `agent_crashed` classification through missing ACP-state fallback. -- Fixed provider resolution so persisted provider values matching the effective provider preserve custom agent command/model on resume. -- Hardened runtime harness process-exit readiness and UDS socket collision recovery. -- Updated web E2E contracts for current automation, bridge, task, settings, Storybook, session, and network UI surfaces. - -## Verification Evidence - -| Area | Result | Evidence | -| ------------------------------ | ---------------: | --------------------------------------------------------------------------------------------------------------------------------------- | -| Network unit regressions | Pass | `go test ./internal/network` | -| Full integration | Pass | `make test-integration`: 6186 tests, 3 skipped, 68.903s | -| Runtime E2E | Pass | `make test-e2e-runtime` | -| Browser/Web E2E | Pass | `make test-e2e-web`: 15/15 specs, including Network route | -| Final release gate | Pass | `make verify`: web 189 files / 1401 tests; Go race DONE 5707 tests in 33.904s; lint 0; boundary check OK | -| Real LLM smoke | Pass | `codex exec` returned `AGH-LLM-SMOKE-OK` | -| Real AGH + Codex ACP + Network | Pass with caveat | Isolated daemon created Codex ACP sessions, normal prompt returned `AGH_REAL_NETWORK_OK`, network direct reached `messages_delivered=1` | - -## Network-Specific Result - -The network stack was validated at four levels: - -1. Unit/regression: delivery queue, prompt rendering, retry scheduling, and retry cap. -2. Integration: router, lifecycle, audit/timeline, manager, daemon network collaboration, and network-origin task reentry. -3. Browser E2E: operator creates/inspects network channel, peers, timeline state, and reload continuity. -4. Real provider smoke: isolated AGH daemon with Codex ACP joined peers to `release-smoke`, sent `direct` envelopes, and recorded delivery. - -## LLM Caveat - -The real AGH network LLM smoke proved transport and delivery to a live Codex ACP agent. The exact network token-response assertion is intentionally marked caveated because the real Codex agent treated network content as untrusted and followed AGH/network safety guidance plus agentic behavior, rather than simply echoing the token over the network message. This is acceptable for release confidence on AGH transport/delivery; deterministic token behavior remains covered by normal AGH prompt smoke and mock ACP E2E. - -## Residual Risk - -- Daytona integration skips remain environmental: 3 skips require `DAYTONA_API_KEY`. -- Real provider behavior is non-deterministic by design. The release suite now separates deterministic AGH correctness from live LLM/provider smoke. - -## Release Recommendation - -Proceed with release candidate. All blocking local verification gates pass, network-specific coverage is green, and the only caveat is provider behavior outside AGH's deterministic control. diff --git a/.codex/qa/release-hermes/qa/test-cases/TC-LLM-001-real-agent-network-smoke.md b/.codex/qa/release-hermes/qa/test-cases/TC-LLM-001-real-agent-network-smoke.md deleted file mode 100644 index 6fe4738c5..000000000 --- a/.codex/qa/release-hermes/qa/test-cases/TC-LLM-001-real-agent-network-smoke.md +++ /dev/null @@ -1,36 +0,0 @@ -# TC-LLM-001: Real LLM Network Smoke - -**Priority:** P1 -**Type:** E2E / Production-like -**Status:** Pass with Caveat -**Created:** 2026-04-24 - -## Objective - -Run one real provider-backed AGH agent turn, when local credentials and provider binaries are available, and verify it can receive a network-formatted prompt without crashing or losing audit records. - -## Preconditions - -- A supported ACP-compatible agent binary is installed. -- Required provider credentials exist in the local environment or provider config. -- Credentials are detected by name only; values are never printed. - -## Test Steps - -1. Detect installed provider binaries and credential variable names. - **Expected:** at least one supported real provider is available, or the test is marked blocked with exact missing prerequisite. - -2. Start AGH with an isolated temp home/workspace and real provider config. - **Expected:** daemon starts, session can be created, network status is enabled. - -3. Send a small direct network message to the real agent session. - **Expected:** the agent receives the `` wrapper and returns a normal turn completion. - -4. Inspect audit/timeline. - **Expected:** message has received/delivered records and no retry/backpressure errors. - -## Execution History - -| Date | Tester | Build | Result | Notes | -| ---------- | ------ | ----- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| 2026-04-24 | Codex | local | Pass | `OPENAI_API_KEY` and `codex` CLI were available. `codex exec` returned `AGH-LLM-SMOKE-OK`. An isolated AGH daemon using Codex ACP created real sessions, returned `AGH_REAL_NETWORK_OK` on a normal prompt, accepted a `direct` network envelope, and reached `messages_delivered=1`. Caveat: the real Codex agent followed AGH/network safety guidance and agentic behavior rather than returning only the token over the network message. | diff --git a/.codex/qa/release-hermes/qa/test-cases/TC-NET-001-network-routing-audit.md b/.codex/qa/release-hermes/qa/test-cases/TC-NET-001-network-routing-audit.md deleted file mode 100644 index d2a99cd4a..000000000 --- a/.codex/qa/release-hermes/qa/test-cases/TC-NET-001-network-routing-audit.md +++ /dev/null @@ -1,39 +0,0 @@ -# TC-NET-001: Network Routing and Audit - -**Priority:** P0 -**Type:** Integration / Regression -**Status:** Pass -**Created:** 2026-04-24 - -## Objective - -Verify that local and remote network envelopes are validated, routed to the correct AGH sessions, and recorded in both audit and timeline persistence. - -## Preconditions - -- AGH network is enabled in the test configuration. -- At least two sessions have joined the same channel. -- Global DB and audit sink are available. - -## Test Steps - -1. Send a valid directed `direct` envelope to a local peer. - **Expected:** exactly one delivery is produced for the target session. - -2. Send a valid broadcast `say` envelope on the channel. - **Expected:** all eligible local peers receive it; sender metadata is preserved. - -3. Send `whois` and capability messages. - **Expected:** peer registry and capability catalog are updated/responded to according to protocol. - -4. Query audit and timeline records. - **Expected:** accepted messages have `received`, completed prompts have `delivered`, outbound generated messages have `sent`. - -5. Send duplicate, expired, unsupported, and invalid target variants. - **Expected:** no local prompt delivery; rejection reason is auditable. - -## Execution History - -| Date | Tester | Build | Result | Notes | -| ---------- | ------ | ----- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| 2026-04-24 | Codex | local | Pass | Covered by full `make test-integration`, daemon network collaboration integration, network manager/router/audit tests, full runtime E2E, full web E2E Network route, and real AGH Codex ACP network smoke with `messages_delivered=1`. | diff --git a/.codex/qa/release-hermes/qa/test-cases/TC-NET-002-delivery-retry-backoff.md b/.codex/qa/release-hermes/qa/test-cases/TC-NET-002-delivery-retry-backoff.md deleted file mode 100644 index 49b3aecaa..000000000 --- a/.codex/qa/release-hermes/qa/test-cases/TC-NET-002-delivery-retry-backoff.md +++ /dev/null @@ -1,39 +0,0 @@ -# TC-NET-002: Network Delivery Retry Backoff - -**Priority:** P0 -**Type:** Regression / Reliability -**Status:** Pass -**Created:** 2026-04-24 - -## Objective - -Verify that a temporary `PromptNetwork` failure does not drop the inbound message and does not immediately spin a new worker in a retry loop. - -## Preconditions - -- `internal/network` delivery coordinator constructed with a fake prompter. -- First prompt attempt returns an error. -- Retry scheduler is instrumented by the test. - -## Test Steps - -1. Accept a directed delivery for an idle session. - **Expected:** one prompt attempt is made. - -2. Make the first prompt attempt fail. - **Expected:** message is requeued at the front, retry attempt increments, and a retry is scheduled with the base backoff delay. - -3. Before running the scheduled retry callback, inspect call count and queue depth. - **Expected:** call count is still one and queue depth is one. - -4. Run the scheduled retry callback. - **Expected:** second prompt attempt receives the same network message and can complete normally. - -5. Validate retry delay function. - **Expected:** retry delays grow exponentially and cap at the configured maximum. - -## Current Evidence - -- `go test ./internal/network` passed on 2026-04-24. -- Covered by `TestDeliveryCoordinatorRetriesPromptFailuresAfterWorkerExit`. -- Covered by `TestDeliveryCoordinatorRetryDelayUsesExponentialCap`. diff --git a/.codex/qa/release-hermes/qa/test-cases/TC-NET-003-network-task-reentry.md b/.codex/qa/release-hermes/qa/test-cases/TC-NET-003-network-task-reentry.md deleted file mode 100644 index ba862b77c..000000000 --- a/.codex/qa/release-hermes/qa/test-cases/TC-NET-003-network-task-reentry.md +++ /dev/null @@ -1,36 +0,0 @@ -# TC-NET-003: Network-Origin Task Reentry - -**Priority:** P0 -**Type:** Integration / E2E -**Status:** Pass -**Created:** 2026-04-24 - -## Objective - -Verify that network-originated detached work preserves ownership, channel metadata, task/run state, and re-enters the owning session after completion. - -## Preconditions - -- Daemon booted with task runtime and network runtime. -- A session is joined to a network channel. -- Detached task run uses network turn source metadata. - -## Test Steps - -1. Enqueue or trigger a network-owned task run. - **Expected:** task origin identifies a network peer/channel and run is queued. - -2. Claim and complete the detached run through the harness/runtime path. - **Expected:** status transitions are persisted without duplicate claims. - -3. Restart or recover the runtime where supported by existing e2e lane. - **Expected:** unbound/running work is requeued or recovered according to task runtime rules. - -4. Observe the owning session. - **Expected:** synthetic network prompt is injected once with correct run/task/channel metadata. - -## Execution History - -| Date | Tester | Build | Result | Notes | -| ---------- | ------ | ----- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| 2026-04-24 | Codex | local | Pass | `go test -race -tags integration ./internal/daemon -run TestDaemonNightlyE2EAutomationTaskResumesIntoNetworkChannel -count=1 -v` passed, and full `make test-integration` passed after provider-resolution/resume fixes. | diff --git a/.codex/qa/release-hermes/qa/test-cases/TC-WEB-001-network-operator-ui.md b/.codex/qa/release-hermes/qa/test-cases/TC-WEB-001-network-operator-ui.md deleted file mode 100644 index 966cd5cc4..000000000 --- a/.codex/qa/release-hermes/qa/test-cases/TC-WEB-001-network-operator-ui.md +++ /dev/null @@ -1,39 +0,0 @@ -# TC-WEB-001: Network Operator UI - -**Priority:** P0 -**Type:** UI / E2E -**Status:** Pass -**Created:** 2026-04-24 - -## Objective - -Verify that the web UI exposes the operator-critical network state: channels, local peers, remote peers, timeline events, reload continuity, and status/error states. - -## Preconditions - -- Web app and daemon test server can run locally. -- Browser automation is available. -- Network route test data is seeded by the e2e harness. - -## Test Steps - -1. Open the web application network route. - **Expected:** route loads with no console/runtime failures. - -2. Create or inspect a channel and peers. - **Expected:** channel name, peer identity and capability information are visible. - -3. Send or replay network timeline events. - **Expected:** timeline shows sent/received/delivered/rejected rows with stable ordering. - -4. Reload the page. - **Expected:** channel and timeline state remain visible from persisted API state. - -5. Capture desktop and mobile screenshots if the route is manually exercised. - **Expected:** no overlapping text, broken controls, or inaccessible critical actions. - -## Execution History - -| Date | Tester | Build | Result | Notes | -| ---------- | ------ | ----- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| 2026-04-24 | Codex | local | Pass | Full daemon-served Playwright suite passed 15/15 specs. Network spec passed: create channel, inspect peers, observe timeline state, and reload without losing visibility. | diff --git a/.codex/qa/release-hermes/qa/test-plans/release-network-regression.md b/.codex/qa/release-hermes/qa/test-plans/release-network-regression.md deleted file mode 100644 index aee16dc18..000000000 --- a/.codex/qa/release-hermes/qa/test-plans/release-network-regression.md +++ /dev/null @@ -1,40 +0,0 @@ -# AGH Release Network Regression Suite - -**Date:** 2026-04-24 -**Suite type:** Smoke + targeted + full release regression. -**Execution status:** Not fully run yet. - -## Smoke Suite - -| ID | Priority | Scenario | Command / method | Expected | -| ------------- | -------: | -------------------------- | ---------------------------- | ----------------------------------------------------- | -| SMOKE-NET-001 | P0 | Network package unit suite | `go test ./internal/network` | Passes without race or goroutine retry churn symptoms | -| SMOKE-REL-001 | P0 | Full repository gate | `make verify` | fmt, lint, tests, builds all pass | -| SMOKE-WEB-001 | P0 | Web network surface builds | included in `make verify` | lint/typecheck/test/build pass | - -## Targeted Network Regression - -| ID | Priority | Scenario | Evidence | -| ---------- | -------: | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------- | -| TC-NET-001 | P0 | Direct and broadcast network messages route only to valid peers and preserve metadata | router/manager integration tests; CLI/API e2e | -| TC-NET-002 | P0 | PromptNetwork failure requeues message and retries with capped backoff, not immediate loop | `TestDeliveryCoordinatorRetriesPromptFailuresAfterWorkerExit` | -| TC-NET-003 | P0 | Busy sessions receive one queued network message per turn end | delivery integration test | -| TC-NET-004 | P0 | Audit records accepted, rejected, sent and delivered network messages | globaldb audit/timeline tests; API timeline | -| TC-NET-005 | P0 | Network-origin task ingress preserves origin/channel and resumes owner | daemon task runtime/integration tests | -| TC-NET-006 | P1 | Web network channels/peers/timeline survive reload | Playwright/browser evidence | -| TC-NET-007 | P1 | Invalid payloads/expired/duplicate messages are rejected without local delivery | router tests and audit entries | - -## Full Regression Commands - -1. `make deps` -2. `make verify` -3. `make test-integration` -4. `make test-e2e-runtime` -5. `make test-e2e-web` -6. Real LLM smoke if environment supports it. - -## Known Execution Notes - -- `make verify` is the required release gate. -- Integration/e2e lanes may take longer and may require local binaries/browsers. Any blocked lane must include exact blocker output. -- Real LLM tests must not print API keys or credential values. diff --git a/.codex/qa/release-hermes/qa/test-plans/release-network-test-plan.md b/.codex/qa/release-hermes/qa/test-plans/release-network-test-plan.md deleted file mode 100644 index a477221d8..000000000 --- a/.codex/qa/release-hermes/qa/test-plans/release-network-test-plan.md +++ /dev/null @@ -1,45 +0,0 @@ -# AGH First Release Network QA Plan - -**Date:** 2026-04-24 -**Scope:** AGH daemon, CLI/API network workflows, web network surface, runtime/harness reentry, release verification. -**Status:** Drafted before execution. - -## Objective - -Validate that AGH is operationally release-ready as a local-first Agent OS, with special emphasis on the network feature. The plan compares production-grade behavior observed in `.resources/hermes` against current AGH and turns release risks into executable checks. - -## Release-Critical Areas - -| Area | Why it matters | Primary evidence | -| ------------------------------ | --------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | -| Network message routing | Agents must exchange directed/broadcast/control messages safely and deterministically. | `internal/network` unit/integration tests, daemon e2e lanes, live CLI/API flow | -| Network delivery resilience | Failed agent prompts must not drop messages or spin retry loops. | `internal/network/delivery_test.go`, runtime stats/audit | -| Audit and timeline persistence | Operators need durable visibility into sent/received/rejected/delivered network events. | global DB tests, API timeline checks, UI reload checks | -| Task ingress and reentry | Network-originated detached work must reconnect to the owning session after async completion. | daemon task runtime tests and e2e harness | -| Web network operations | The operator UI must reflect channels, peers, timeline and reload continuity. | Playwright/browser test with screenshots | -| Full release gate | Formatting, lint, race tests, web tests/build and Go build must all pass. | `make verify` | - -## Hermes Comparison Summary - -| Hermes production-grade behavior | AGH equivalent | Release action | -| ------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | -| Background work/process registry with persisted recovery checkpoint and completion notifications. | Detached harness run metadata plus reentry bridge and synthetic prompts. | Validate with daemon task/runtime e2e and restart/reentry scenarios. | -| Inactivity-aware runtime timeouts and graceful drain instead of wall-clock cancellation. | Session/daemon shutdown and tracked agent processes; network delivery drains through turn-end notifier. | Validate no network prompt interrupts active turn; inspect shutdown logs for pending messages. | -| Jitter/backoff for retries and reconnects to avoid hot loops. | NATS reconnect handlers exist; inbound delivery retry previously restarted immediately. | Fixed: delivery failures now schedule exponential capped retry; unit regression added. | -| Durable platform message logs and dedupe safeguards. | Network audit log and timeline DB with duplicate message-id ignore and router replay checks. | Validate audit/timeline after direct, whois, rejected and delivered flows. | -| Operator diagnostics for background status and platform health. | `NetworkStatus`, queued/inflight/worker metrics, API/CLI/web surfaces. | Validate status includes queue/worker counters and channel details. | - -## Execution Order - -1. Smoke gate: `go test ./internal/network`, then `make verify`. -2. Integration gate: `make test-integration` for tagged daemon/network/store scenarios. -3. E2E runtime gate: `make test-e2e-runtime` for real daemon/harness flows. -4. E2E web gate: `make test-e2e-web` plus browser inspection of network pages where available. -5. Real LLM gate: detect configured providers without printing secrets; run a small network-capable real-agent flow if credentials and agent binaries are available. -6. Evidence report: record commands, results, issues, screenshots/log paths, and remaining risk. - -## Pass/Fail Criteria - -- P0 tests must pass: network routing/delivery, audit/timeline, task reentry, release verify. -- Any data loss, unbounded retry, message misdelivery, unaudited accepted message, or UI inability to operate network channels blocks release. -- Real LLM validation may be marked blocked only if local credentials or provider binaries are unavailable; the mocked/runtime/e2e gates still must pass. diff --git a/.codex/release-qa/qa/test-cases/SMOKE-001.md b/.codex/release-qa/qa/test-cases/SMOKE-001.md deleted file mode 100644 index bad01226e..000000000 --- a/.codex/release-qa/qa/test-cases/SMOKE-001.md +++ /dev/null @@ -1,32 +0,0 @@ -## SMOKE-001: Repository Verification Gate - -**Priority:** P0 -**Type:** Smoke -**Status:** Not Run -**Estimated Time:** 20 minutes -**Created:** 2026-04-24 -**Last Updated:** 2026-04-24 - -### Objective - -Verify the canonical AGH release gate passes from the current workspace. - -### Preconditions - -- `go`, `bun`, and repository dependencies are available. -- Worktree state has been reviewed. - -### Test Steps - -1. Run `make verify`. - **Expected:** Formatting, codegen check, web lint/typecheck/test/build, Go lint, race unit tests, build, and boundary checks complete with exit code 0. - -2. Inspect output for warnings and failures. - **Expected:** No warnings, errors, or failed tests remain. - -### Edge Cases & Variations - -| Variation | Input | Expected Result | -| ------------------- | ---------------------------- | -------------------------------------------- | -| Missing web bundle | `web/dist/index.html` absent | Gate rebuilds or reports actionable failure. | -| Generated API drift | Stale OpenAPI artifacts | `CodegenCheck` fails before build claim. | diff --git a/.codex/release-qa/qa/test-cases/TC-INT-001.md b/.codex/release-qa/qa/test-cases/TC-INT-001.md deleted file mode 100644 index a5d566a22..000000000 --- a/.codex/release-qa/qa/test-cases/TC-INT-001.md +++ /dev/null @@ -1,38 +0,0 @@ -## TC-INT-001: Network Backpressure Is Audited - -**Priority:** P0 -**Type:** Integration -**Status:** Not Run -**Estimated Time:** 10 minutes -**Created:** 2026-04-24 -**Last Updated:** 2026-04-24 - -### Objective - -Verify that inbound network queue overflow is operationally visible instead of silently dropping messages. - -### Preconditions - -- Network manager can run with an audit writer. -- A test session can join a channel and be held busy to force queueing. - -### Test Steps - -1. Start a network manager with `max_queue_depth=1` and an audit sink. - **Expected:** Manager starts and reports network status. - -2. Mark the target session as prompting/busy and accept two inbound messages. - **Expected:** The first queued message is evicted when the second arrives. - -3. Query status and audit output. - **Expected:** `messages_rejected` increments and audit contains a rejected entry for the evicted message with reason `queue_overflow`. - -4. Release the prompt and drain the remaining message. - **Expected:** The remaining message is delivered and the delivered counter increments. - -### Edge Cases & Variations - -| Variation | Input | Expected Result | -| ------------------------ | --------------------------- | ---------------------------------------------------- | -| Multiple overflow events | Three messages with depth 1 | Each evicted message is audited once. | -| Audit sink failure | Store write fails | Drop is logged and delivery continues without panic. | diff --git a/.codex/release-qa/qa/test-cases/TC-INT-002.md b/.codex/release-qa/qa/test-cases/TC-INT-002.md deleted file mode 100644 index 8171e1b1e..000000000 --- a/.codex/release-qa/qa/test-cases/TC-INT-002.md +++ /dev/null @@ -1,38 +0,0 @@ -## TC-INT-002: Network Direct Reply Lifecycle - -**Priority:** P0 -**Type:** Integration -**Status:** Not Run -**Estimated Time:** 15 minutes -**Created:** 2026-04-24 -**Last Updated:** 2026-04-24 - -### Objective - -Verify the core AGH Network direct-message lifecycle through public daemon/runtime surfaces. - -### Preconditions - -- Daemon/runtime e2e harness is available. -- Mock ACP agents or real ACP-compatible agents can join a network channel. - -### Test Steps - -1. Start an isolated AGH daemon with network enabled. - **Expected:** Daemon reports network status `running`. - -2. Create two agent sessions and join them to a shared channel. - **Expected:** Both peers appear in network peer listings. - -3. Send a `direct` message from one peer to the other. - **Expected:** Target receives a network prompt with message ID, channel, kind, sender, and reply guidance. - -4. Send `receipt` and `trace` lifecycle messages. - **Expected:** Lifecycle messages are accepted, correlated, audited, and visible in network timeline/status. - -### Edge Cases & Variations - -| Variation | Input | Expected Result | -| -------------------- | -------------------- | ------------------------------------------------------- | -| Missing target peer | Unknown `--to` | Send fails before publish with target-not-found error. | -| Duplicate message ID | Same envelope replay | Duplicate is rejected or ignored with audit visibility. | diff --git a/.codex/release-qa/qa/test-cases/TC-INT-003.md b/.codex/release-qa/qa/test-cases/TC-INT-003.md deleted file mode 100644 index 2f0b53c55..000000000 --- a/.codex/release-qa/qa/test-cases/TC-INT-003.md +++ /dev/null @@ -1,35 +0,0 @@ -## TC-INT-003: Whois And Capability Exchange - -**Priority:** P0 -**Type:** Integration -**Status:** Not Run -**Estimated Time:** 15 minutes -**Created:** 2026-04-24 -**Last Updated:** 2026-04-24 - -### Objective - -Verify AGH Network peer discovery and capability exchange, the primary coordination primitive for an Agent OS. - -### Preconditions - -- Network is enabled. -- At least two sessions expose network capability catalogs. - -### Test Steps - -1. Join sessions with distinct capability catalogs. - **Expected:** Peer cards expose capability briefs without mutating original catalog data. - -2. Send a `whois` request for a capability. - **Expected:** Matching peers respond with `whois` responses and capability catalog metadata. - -3. Verify persisted audit/timeline data. - **Expected:** Request and response messages are queryable through network APIs/UI surfaces. - -### Edge Cases & Variations - -| Variation | Input | Expected Result | -| ------------------- | --------------- | --------------------------------------- | -| No capability match | Unknown query | No false positive responder is emitted. | -| Directed whois | `--to` one peer | Only the target peer responds. | diff --git a/.codex/release-qa/qa/test-cases/TC-LIVE-001.md b/.codex/release-qa/qa/test-cases/TC-LIVE-001.md deleted file mode 100644 index d3dc574f4..000000000 --- a/.codex/release-qa/qa/test-cases/TC-LIVE-001.md +++ /dev/null @@ -1,39 +0,0 @@ -## TC-LIVE-001: Real LLM And AGH Network Smoke - -**Priority:** P0 -**Type:** Integration -**Status:** Not Run -**Estimated Time:** 20 minutes -**Created:** 2026-04-24 -**Last Updated:** 2026-04-24 - -### Objective - -Verify AGH can use an installed real LLM-capable agent and route a network message in an isolated local runtime. - -### Preconditions - -- At least one supported CLI agent is installed (`codex`, `claude`, or equivalent). -- Required provider credentials are available in the environment or already authenticated. -- Isolated `AGH_HOME` can be created under `/tmp`. - -### Test Steps - -1. Run a direct real LLM smoke command. - **Expected:** The agent returns the requested token or a deterministic success response. - -2. Start AGH daemon with isolated `AGH_HOME` and network enabled. - **Expected:** Daemon starts and `agh daemon status -o json` reports network `running`. - -3. Create an ACP session using the real agent and send a normal prompt. - **Expected:** Prompt completes and transcript/events are persisted. - -4. Send a network `direct` message to the session. - **Expected:** Network status/audit reports the direct message delivered, or the exact live-agent behavior is captured if the agent takes tool actions instead of returning a token. - -### Edge Cases & Variations - -| Variation | Input | Expected Result | -| ------------------------ | ------------------------ | ------------------------------------------------ | -| Missing credentials | No provider auth | Test is blocked with exact missing prerequisite. | -| Agent streams tool calls | Real agent invokes tools | AGH preserves events and does not crash. | diff --git a/.codex/release-qa/qa/test-cases/TC-UI-001.md b/.codex/release-qa/qa/test-cases/TC-UI-001.md deleted file mode 100644 index 309fd68da..000000000 --- a/.codex/release-qa/qa/test-cases/TC-UI-001.md +++ /dev/null @@ -1,35 +0,0 @@ -## TC-UI-001: Network Web UI Smoke - -**Priority:** P1 -**Type:** UI/Visual -**Status:** Not Run -**Estimated Time:** 10 minutes -**Created:** 2026-04-24 -**Last Updated:** 2026-04-24 - -### Objective - -Verify the browser-visible network surface loads and reflects live daemon data. - -### Preconditions - -- AGH web dev server or e2e web lane can start. -- Network fixture data is available through the daemon. - -### Test Steps - -1. Open the AGH web app network route. - **Expected:** Network workspace is visible without console errors. - -2. Select a channel and a peer. - **Expected:** Header, peer count, message count, and timeline update for the selected room. - -3. Trigger or inspect a send/composer flow if available. - **Expected:** UI uses typed protocol kinds and does not obscure errors. - -### Edge Cases & Variations - -| Variation | Input | Expected Result | -| ------------- | ------------------------------ | ---------------------------------------- | -| Empty network | No channels | Empty state is visible and non-crashing. | -| API failure | Network endpoint returns error | Error state is visible and actionable. | diff --git a/.codex/release-qa/qa/test-plans/agh-release-openclaw-regression.md b/.codex/release-qa/qa/test-plans/agh-release-openclaw-regression.md deleted file mode 100644 index e17f87318..000000000 --- a/.codex/release-qa/qa/test-plans/agh-release-openclaw-regression.md +++ /dev/null @@ -1,29 +0,0 @@ -# AGH Release Regression Suite - -## Execution Order - -1. Smoke suite: build contract, CLI status, daemon start/stop, network status. -2. P0 network suite: join channel, direct delivery, whois/capability exchange, backpressure audit, persisted timeline. -3. P1 release suite: integration lane, runtime e2e, web e2e, browser network view. -4. Live smoke: real LLM prompt and real AGH network direct delivery when local credentials/tools exist. -5. Exploratory comparison: inspect OpenClaw-derived production patterns for uncovered gaps. - -## Pass/Fail Criteria - -PASS: - -- All P0 cases pass. -- No critical bugs remain open. -- `make verify` passes after final code changes. -- Any blocked live scenarios list exact missing credentials or runtime prerequisites. - -FAIL: - -- Any P0 network delivery or audit case fails. -- `make verify` fails after final code changes. -- Network messages can be dropped without audit/status visibility. -- Daemon cannot start, stop, or report status in an isolated home. - -CONDITIONAL: - -- Credentialed live third-party channel flows are blocked, but all local network, e2e, and LLM smoke boundaries pass. diff --git a/.codex/release-qa/qa/test-plans/agh-release-openclaw-test-plan.md b/.codex/release-qa/qa/test-plans/agh-release-openclaw-test-plan.md deleted file mode 100644 index a49478dc5..000000000 --- a/.codex/release-qa/qa/test-plans/agh-release-openclaw-test-plan.md +++ /dev/null @@ -1,85 +0,0 @@ -# AGH First Release QA Plan - OpenClaw Comparison - -## Executive Summary - -This plan validates AGH release readiness after comparing the local AGH implementation against `.resources/openclaw` production-grade operational patterns. The highest-risk surface is AGH Network because it coordinates live agent sessions, persisted audit/timeline state, NATS transport, and CLI/API/Web user workflows. - -Objectives: - -- Verify the repository contract and release gates from the current workspace. -- Compare AGH Network and runtime behavior against OpenClaw patterns for delivery, recovery, live testing, boundary checks, and operational diagnostics. -- Add or run automated coverage for any critical production-readiness gaps discovered. -- Exercise the network feature through public interfaces, including real LLM smoke where credentials and local tools allow. - -Key risks: - -- Network messages can be accepted but not delivered, then become invisible to operators. -- Live LLM flows can differ from mock ACP flows because real agents may call tools, stream unexpectedly, or obey safety guidance. -- Browser UI and API surfaces can drift from backend contracts. -- Release gates can pass while credentialed or e2e lanes fail. - -## Scope - -In scope: - -- Go backend verification through `make verify`, `make test-integration`, and `make test-e2e`. -- Network message delivery, backpressure, audit, status counters, and public CLI/API flows. -- Web UI smoke flows for the network surface if the dev/e2e server can be started. -- Real local LLM smoke using installed ACP-compatible tools when credentials exist. -- QA artifacts under `.codex/release-qa/qa/`. - -Out of scope: - -- Publishing a release artifact. -- Modifying `.resources/openclaw`. -- Credentialed third-party channels without local secrets. -- Legacy compatibility with old AGH state. - -## Test Strategy - -1. Discovery: read Makefile, CI, release workflow, docs, and network implementation. -2. Baseline: run the canonical verification gate and focused network tests before final claims. -3. Comparison-driven hardening: use OpenClaw evidence to find production-readiness gaps; add targeted regression tests before production fixes. -4. Public-surface validation: prefer CLI, HTTP/UDS, e2e harness, and browser flows over internal helpers. -5. Live integration: run real LLM smoke if local credentials/tools are available, and document blockers exactly. -6. Final gate: rerun full verification after the last code change. - -## Environment Requirements - -- macOS local workspace at `/Users/pedronauck/Dev/compozy/agh`. -- Go and Bun versions compatible with CI (`GO_VERSION=1.25.4`, `BUN_VERSION=1.3.4`). -- Local `codex`, `claude`, or another ACP-compatible agent for live LLM smoke. -- Optional browser validation through the Codex in-app browser or repo Playwright lane. -- Optional live provider credentials such as `OPENAI_API_KEY`. - -## Entry Criteria - -- Worktree state reviewed with `git status --short`. -- Root instructions, Makefile, CI, release workflow, and relevant network docs read. -- QA artifact directory exists. -- No destructive git commands are used. - -## Exit Criteria - -- All P0 test cases pass or have a documented blocking prerequisite. -- `make verify` passes after the final code change. -- Network-focused tests pass after the fix. -- Integration/e2e/live validations are run where locally possible and documented. -- Verification report exists at `.codex/release-qa/qa/verification-report.md`. - -## Risk Assessment - -| Risk | Probability | Impact | Mitigation | -| ---------------------------------------------- | ----------: | -------: | ------------------------------------------------------------------------------------------- | -| Silent network message loss under backpressure | Medium | Critical | Add audit/status coverage and production hook for queue drops. | -| Real LLM behavior diverges from mocks | Medium | High | Run a real LLM smoke and capture exact command/output summary. | -| E2E lane flakes due to browser/runtime timing | Medium | High | Use existing harness lanes and retry only after root-cause inspection. | -| Credentialed live scenarios unavailable | High | Medium | Validate local boundaries and report blocked credentialed cases explicitly. | -| Release workflow misses heavy lanes | Low | High | Run `make test-integration` and `make test-e2e` in addition to `make verify` when feasible. | - -## Timeline and Deliverables - -- Test plan and cases: `.codex/release-qa/qa/test-plans/`, `.codex/release-qa/qa/test-cases/`. -- Bug reports if found: `.codex/release-qa/qa/issues/`. -- Screenshots/browser evidence: `.codex/release-qa/qa/screenshots/`. -- Final verification report: `.codex/release-qa/qa/verification-report.md`. diff --git a/.codex/release-qa/qa/verification-report.md b/.codex/release-qa/qa/verification-report.md deleted file mode 100644 index 1a66e0fd9..000000000 --- a/.codex/release-qa/qa/verification-report.md +++ /dev/null @@ -1,58 +0,0 @@ -# AGH Release Verification Report - OpenClaw Comparison - -Date: 2026-04-24 - -## Scope - -Validate AGH for first-release readiness by comparing `.resources/openclaw` production-grade behavior against AGH, applying critical fixes, and running release-grade QA with special focus on the agent network. - -## OpenClaw comparison result - -OpenClaw treats delivery/backpressure as an auditable production concern: outbound delivery state is persisted, failures are classified, and recovery is observable. The critical AGH gap found during comparison was that inbound network queue overflow dropped the oldest envelope with log-only visibility. For an Agent OS, invisible message loss is a P0 operational risk because operators cannot distinguish "no work happened" from "work was dropped under load". - -## Fixes applied - -- Network delivery overflow is now surfaced through the manager audit path as a rejected delivery with reason `queue_overflow`. -- A regression test now proves overflowed inbound network envelopes create rejected audit records instead of disappearing silently. -- Daemon restart readiness handling now preserves replacement-process exit evidence when readiness timeout and process exit race at the boundary. -- Bridge route details now expose route session IDs in the UI so operators can trace bridge-created routes into sessions. -- Teams provider integration now waits for the specific managed instances to report `ready`, eliminating the repeated false failure caused by counting any two state records. - -## Verification matrix - -| Area | Command / evidence | Result | -| ----------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | -| Network overflow regression | `go test ./internal/network -run TestManagerAuditsBusyQueueOverflowAsRejected -count=1` | Pass | -| Network package | `go test ./internal/network -count=1` | Pass | -| Restart race regression | `go test -race ./internal/daemon -run TestRunRelaunchHelperWrapperUsesDefaultLauncherAndPersistsFailure -count=10` | Pass | -| Daemon package | `go test -race ./internal/daemon -count=1` | Pass | -| Teams integration flake | `go test -race -tags integration ./internal/extension -run TestTeamsProviderLaunchNegotiatesBridgeRuntime -count=10 -v` | Pass | -| Extension integration package | `go test -race -tags integration ./internal/extension -count=1` | Pass | -| Web bridge route regression | `bun run --cwd web test:raw src/systems/bridges/components/bridge-detail-panel.test.tsx` | Pass, 7 tests | -| Web nightly route flow | `bun run --cwd web test:e2e:nightly` | Pass, 1 spec | -| Full integration | `make test-integration` | Pass, 6187 tests, 3 skipped | -| Blocking repo gate | `make verify` | Pass: web format/lint/typecheck/unit/build, Go lint/race tests/build, package boundaries | -| Full e2e | `make test-e2e` | Pass: daemon/API/testutil lanes and 15 daemon-served Playwright specs | -| Nightly e2e | `make test-e2e-nightly` | Pass: runtime/nightly lanes, daemon-served Playwright 15 specs, nightly Playwright 1 spec | -| Patch hygiene | `git diff --check` | Pass | - -## Live LLM and network validation - -- Direct Codex CLI LLM smoke passed: `codex exec --ephemeral --skip-git-repo-check --sandbox read-only -C /tmp --json "Reply with exactly AGH-OPENCLAW-LLM-SMOKE-OK and nothing else."` returned exactly `AGH-OPENCLAW-LLM-SMOKE-OK`. -- Live AGH/Codex ACP prompt smoke passed with a short deterministic token: `prompt_text=OK`. -- Live AGH network smoke passed with two Codex ACP sessions joined to the `release` channel: - - Direct message from sender to receiver was audited as `sent` and `received`. - - Receiver replied directly to sender. - - Reply was audited as `sent` and `received`. - - Final status had `messages_rejected=0` and direct-kind sent/received metrics for both directions. - -## Skips and caveats - -- Daytona credentialed integration/nightly tests were skipped by their own guard because `DAYTONA_API_KEY` is not present in the environment. -- This report therefore validates all available local, integration, e2e, browser, and live Codex/LLM lanes. Credentialed Daytona validation still requires providing `DAYTONA_API_KEY` and rerunning the Daytona lane. - -## Release assessment - -Release QA status: PASS for all available gates. - -The OpenClaw comparison produced one critical network production-readiness fix and the verification cycle exposed and fixed two additional release blockers: a daemon restart race and a nondeterministic Teams integration wait. The agent network has unit, integration, e2e browser, and live LLM-backed validation evidence after the fixes. diff --git a/.codex/tmp/harness-spec-review-prompt.md b/.codex/tmp/harness-spec-review-prompt.md deleted file mode 100644 index c93145f9f..000000000 --- a/.codex/tmp/harness-spec-review-prompt.md +++ /dev/null @@ -1,306 +0,0 @@ -# Review Request: AGH Harness TechSpec Draft - -Você está revisando criticamente uma draft de TechSpec para o AGH. - -Use `claude`/`opus` com raciocínio alto e, importante, **use subagents especializados** para explorar e revisar a spec antes de concluir. Quero pelo menos estes ângulos: - -- arquitetura/runtime -- prompt/harness policy -- storage/observability -- test strategy / rollout risk - -## Objetivo da revisão - -Avaliar se a TechSpec abaixo está sólida o suficiente para aprovação, considerando: - -- aderência ao código atual do AGH -- riscos arquiteturais ou lacunas importantes -- inconsistências entre ADRs e spec -- escopo v1 realista -- desenho de APIs/interfaces internas -- riscos em background reentry, augmenters e profile resolution -- pontos que estão vagos demais e precisam ficar mais concretos antes de aprovar - -## Instruções de saída - -Responda em português do Brasil. - -Estruture a resposta assim: - -1. `Veredito` - - `Aprovável como está` - - `Aprovável com ajustes` - - `Precisa de revisão antes de aprovar` -2. `Achados` - - liste findings por severidade, do mais importante para o menos importante - - cite seções da spec e paths do código quando relevante - - seja específico e adversarial -3. `Mudanças recomendadas na spec` - - diga exatamente o que mudar no texto/estrutura da TechSpec -4. `Riscos residuais` - - o que ainda ficaria para follow-up mesmo após ajustes - -Se não encontrar problemas relevantes, diga explicitamente por que a spec está coerente. - -## Contexto e artefatos para inspecionar - -Leia e compare com a draft: - -- `.compozy/tasks/harness/adrs/adr-001.md` -- `.compozy/tasks/harness/adrs/adr-002.md` -- `.compozy/tasks/harness/adrs/adr-003.md` -- `.compozy/tasks/harness/adrs/adr-004.md` -- `internal/session/manager_prompt.go` -- `internal/session/interfaces.go` -- `internal/session/manager_network_skill.go` -- `internal/daemon/composed_assembler.go` -- `internal/daemon/boot.go` -- `internal/daemon/daemon.go` -- `internal/memory/recall.go` -- `docs/ideas/orchestration/multi-agent-patterns-analysis.md` -- `docs/ideas/from-claude-code/filtered_recommendations.md` -- `docs/ideas/market-pair/gap-analysis.md` - -## Draft da TechSpec - -# TechSpec: Harness Runtime v1 - -## Executive Summary - -This initiative defines the first internal-only harness foundation for AGH. There is no `_prd.md` for `harness`; this document uses the current runtime architecture, recent competitor analysis, and local orchestration research as the authoritative input. The implementation focuses on four runtime capabilities: explicit internal `HarnessProfile` selection, structured startup prompt layering, ordered turn-time prompt augmentation, and first-class background completion reentry. - -The implementation strategy is to extend existing seams rather than introduce a parallel architecture. Startup prompt behavior remains on the existing assembler/provider chain in `internal/daemon` and `internal/session`. Turn-time context remains on the existing prompt augmentation seam in `session.Manager`. Background completion becomes a first-class daemon runtime concept backed by global storage and policy-based synthetic reentry. The primary trade-off is deliberate scope restraint: v1 creates a strong harness foundation but explicitly defers richer coordinator/planner/reviewer orchestration to follow-up work. - -## System Architecture - -### Component Overview - -The implementation consists of these main components: - -- `internal/session`: owns session-level harness state, prompt dispatch, turn-source handling, and ordered turn augmentation before `driver.Prompt`. -- `internal/daemon`: owns startup prompt assembly, provider registration, profile-aware section selection, and background runtime wiring during boot. -- `internal/store/globaldb`: persists `BackgroundRun` records and related lifecycle metadata in daemon-global storage. -- `internal/observe` plus existing event summary surfaces: records harness lifecycle signals for operator visibility. -- `internal/memory`: remains one prompt augmenter implementation, but no longer defines the harness pattern by itself; it becomes one participant in an ordered augmenter pipeline. - -Data flow is intentionally split: - -- Session startup resolves a base `HarnessProfile`, stores it in session metadata, and assembles the startup prompt through the existing assembler chain. -- Prompt dispatch records the original user input, then applies ordered augmenters based on the session profile and turn-level signals. -- Background work records a `BackgroundRun` in global storage, updates lifecycle state as it progresses, and emits completion observability. -- If the run policy requires reentry, the daemon synthesizes an internal prompt/event back to the owning session instead of requiring explicit polling. - -## Implementation Design - -### Core Interfaces - -The harness foundation needs a narrow internal persistence surface for background work: - -```go -type BackgroundRunStore interface { - Create(ctx context.Context, run BackgroundRunRecord) error - Get(ctx context.Context, id string) (*BackgroundRunRecord, error) - ListByOwnerSession(ctx context.Context, sessionID string) ([]BackgroundRunRecord, error) - MarkCompleted(ctx context.Context, id string, result BackgroundRunResult) error -} -``` - -Turn augmentation should stay explicit and ordered rather than hidden behind one opaque callback: - -```go -type TurnAugmenter interface { - Name() string - Augment(ctx context.Context, session *Session, input PromptInput) (PromptInput, error) -} -``` - -### Data Models - -Core runtime additions: - -- `HarnessProfile`: internal enum with `interactive`, `network`, `background`, and `worker`. -- `BackgroundRunState`: internal enum such as `queued`, `running`, `completed`, `failed`, `canceled`. -- `BackgroundRunPolicy`: internal policy describing whether completion is silent or triggers session reentry. - -Core persistent models: - -- `BackgroundRunRecord` - - `id` - - `owner_session_id` - - `owner_workspace_id` - - `profile` - - `policy` - - `state` - - `source` - - `summary` - - `error` - - `created_at` - - `started_at` - - `completed_at` -- Session metadata extension - - add `harness_profile` as the persisted base profile for resume and observability - -Ephemeral prompt-dispatch models: - -- `PromptInput` - - original message text - - normalized turn source - - optional background reentry payload - - optional network metadata -- `PromptSectionDescriptor` - - section name - - section category - - order - - budget - - profile eligibility - -No new public OpenAPI or CLI contract is required in v1. The first slice is runtime-internal and should expose observability through existing event surfaces rather than a new public harness CRUD API. - -### API Endpoints - -No new public HTTP or UDS endpoints are required in v1. - -Internal daemon/runtime surfaces change as follows: - -- session startup uses the existing `PromptAssembler` path with profile-aware section selection -- prompt dispatch uses the existing `PromptInputAugmenter` seam, evolved into ordered augmenters -- daemon runtime uses a new internal `BackgroundRunStore` backed by `internal/store/globaldb` -- observability uses existing event summary and lifecycle emission paths - -If operator-facing read APIs become necessary later, they should be added as follow-up work after the runtime contract stabilizes. - -## Integration Points - -No external service integration is required in v1. The implementation stays within the existing AGH daemon, session manager, global store, and observer boundaries. - -## Impact Analysis - -| Component | Impact Type | Description and Risk | Required Action | -| ------------------------------------ | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------- | -| `internal/session` | modified | Adds base profile handling, ordered turn augmentation, and synthetic background reentry hooks. Medium risk because prompt flow is load-bearing. | Extend prompt lifecycle types and tests carefully. | -| `internal/daemon` | modified | Extends startup assembler/provider chain with profile-aware section selection and background runtime wiring. Medium risk. | Keep boot ownership clear and avoid moving policy into session code. | -| `internal/store/globaldb` | modified | Adds `BackgroundRun` persistence and read/write helpers. Medium risk due to schema evolution. | Add tables/indexes and lifecycle queries with tests. | -| `internal/observe` / event summaries | modified | Adds harness lifecycle visibility for profile resolution and background run completion. Low to medium risk. | Emit structured events and keep ordering stable. | -| `internal/memory` | modified | Conforms to ordered augmenter pipeline as one augmenter implementation. Low risk. | Preserve existing recall behavior and persistence semantics. | -| `internal/api/*` | unchanged for v1 | No public API contract change is required in the first slice. | Defer public harness endpoints unless runtime evidence justifies them. | - -## Testing Approach - -### Unit Tests - -Required unit coverage: - -- profile resolution from session-level inputs -- turn-time signal projection for `TurnSource`, network context, and background reentry context -- startup section selection and section ordering by `HarnessProfile` -- section budget behavior and truncation or omission policy -- ordered augmenter execution and failure handling -- guarantee that stored user input remains the original message while the driver sees the augmented message -- `BackgroundRun` state transitions and policy evaluation - -### Integration Tests - -Required integration coverage: - -- session create and resume preserve base `harness_profile` -- startup prompt assembly changes correctly by profile -- network-originated turns activate the correct turn-time behavior without mutating stored input -- a completed `BackgroundRun` updates global storage and emits observability -- reentering background completion generates a synthetic internal prompt/event for the owning session -- silent background completion records lifecycle data without waking the session - -Required verification gates before completion: - -- `make verify` - -## Development Sequencing - -### Build Order - -1. Add internal harness types and session metadata support for `HarnessProfile`. No dependencies. -2. Extend the existing assembler/provider chain with profile-aware section descriptors and budgets. Depends on step 1. -3. Replace the single prompt augmenter callback with an ordered augmenter pipeline while preserving current memory recall behavior. Depends on step 1. -4. Add `BackgroundRun` persistence to global daemon storage and schema helpers. Depends on step 1. -5. Wire background completion lifecycle and policy-based synthetic reentry into daemon and session flow. Depends on steps 3 and 4. -6. Add harness observability events and integration coverage for profile, augmentation, and background completion. Depends on steps 2, 3, 4, and 5. -7. Run full verification and tighten failure-path behavior. Depends on step 6. - -### Technical Dependencies - -- Session metadata persistence must be stable before profile-aware resume can be tested. -- The assembler extension must preserve existing prompt provider ordering semantics. -- The augmenter pipeline must preserve the existing “store original input, dispatch augmented input” invariant. -- Global daemon storage schema changes for `BackgroundRun` must land before background reentry wiring is implemented. - -## Monitoring and Observability - -Operational visibility should include: - -- log events for `harness_profile_resolved` -- log events for `prompt_augmenter_applied` and `prompt_augmenter_failed` -- log events for `background_run_created`, `background_run_completed`, `background_run_failed`, and `background_run_reentered` -- structured log fields: - - `session_id` - - `workspace_id` - - `harness_profile` - - `turn_source` - - `background_run_id` - - `reentry_policy` - - `reentered` -- event summary visibility for background completion and harness lifecycle transitions -- metrics or counters for: - - background runs created - - background runs completed - - background runs reentered - - augmenter failures - - profile distribution by session - -## Technical Considerations - -### Key Decisions - -- Decision: keep the v1 harness internal-only. - Rationale: the runtime needs stable semantics before exposing configuration. - Trade-off: less operator control in the first slice. - Alternatives rejected: user-declared profiles from day one. - -- Decision: use a session-level base profile plus turn-level signals. - Rationale: startup prompt needs stability, but turns still need contextual behavior. - Trade-off: two layers of policy instead of one. - Alternatives rejected: fully dynamic per-turn resolution. - -- Decision: extend the current assembler/provider and augmenter seams. - Rationale: current seams already map well to stable startup context versus volatile turn-time context. - Trade-off: richer behavior inside existing abstractions rather than a clean-slate redesign. - Alternatives rejected: new top-level harness policy component. - -- Decision: model background completion as `BackgroundRun` in global storage with policy-based reentry. - Rationale: background work needs durable runtime identity and inspectable lifecycle. - Trade-off: extra runtime entity and storage surface. - Alternatives rejected: task runtime reuse and event-only modeling. - -- Decision: defer coordinator/planner/reviewer orchestration contracts from v1. - Rationale: that work belongs to a richer orchestration layer, not the harness foundation slice. - Trade-off: v1 stops at `worker`-grade behavior. - Alternatives rejected: pulling full coordinator-grade orchestration into the same first spec. - -### Known Risks - -- Prompt policy may spread across too many files if profile logic is not centralized. -- Background reentry may over-notify sessions if internal policy defaults are too permissive. -- Existing augmenters may start depending on ordering accidentally if contracts are not explicit. -- Future orchestration work may need additional metadata beyond v1 `BackgroundRun` and `worker` semantics. - -Mitigations: - -- centralize profile resolution in daemon-owned runtime policy -- keep reentry policy narrow and explicit in v1 -- name and order augmenters explicitly -- document follow-up orchestration work clearly instead of implying it is solved by this slice - -## Architecture Decision Records - -- [ADR-001: Use Internal Harness Profiles with Hybrid Resolution](adrs/adr-001.md) — Introduces internal `HarnessProfile` selection with a session base plus turn-level signals. -- [ADR-002: Extend Existing Prompt Assembly and Turn Augmentation Seams](adrs/adr-002.md) — Reuses the current assembler/provider and augmentation seams instead of creating a parallel prompt policy stack. -- [ADR-003: Model Background Completion as a Global BackgroundRun with Policy-Based Reentry](adrs/adr-003.md) — Adds a global daemon runtime entity for detached work and optional synthetic session reentry. -- [ADR-004: Defer Coordinator-Grade Orchestration Contracts from Harness v1](adrs/adr-004.md) — Keeps richer multi-agent orchestration out of the first harness slice and points to explicit follow-up work. diff --git a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/memory/MEMORY.md b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/memory/MEMORY.md index d2cf14a25..f215493f9 100644 --- a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/memory/MEMORY.md +++ b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/memory/MEMORY.md @@ -3,7 +3,7 @@ Keep only durable, cross-task context here. Do not duplicate facts that are obvious from the repository, PRD documents, or git history. ## Current State -- Tasks 01-05 are implemented and verified. Capability catalogs now load during agent-directory discovery in `internal/config`, flow through the session-owned network join payload, project brief discovery into local `PeerCard` state, power explicit rich `whois` discovery through envelope `ext`, and now have a runtime-facing authoring guide in `docs/agents/capabilities.md`. +- Tasks 01-05 are implemented and verified. Capability catalogs now load during agent-directory discovery in `internal/config`, flow through the session-owned network join payload, project brief discovery into local `PeerCard` state, power explicit rich `whois` discovery through envelope `ext`, and now have a runtime-facing authoring guide in `docs/rfcs/005_capability-catalogs-agent-directories.md`. ## Shared Decisions - Downstream runtime and network tasks should consume `AgentDef.Capabilities` rather than rereading capability files; task 01 also updated workspace/daemon/extension clone paths so the loaded catalog survives those hops. @@ -22,4 +22,4 @@ Keep only durable, cross-task context here. Do not duplicate facts that are obvi - None currently. ## Handoffs -- Task 06 should treat `docs/agents/capabilities.md` as the author-facing source for local layouts and validation rules, and `docs/rfcs/003_agh-network-v0.md` as the wire-facing source for brief and rich capability discovery keys. +- Task 06 should treat `docs/rfcs/005_capability-catalogs-agent-directories.md` as the author-facing source for local layouts and validation rules, and `docs/rfcs/003_agh-network-v0.md` as the wire-facing source for brief and rich capability discovery keys. diff --git a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/memory/task_05.md b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/memory/task_05.md index ef313b326..e0c3aaba9 100644 --- a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/memory/task_05.md +++ b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/memory/task_05.md @@ -4,7 +4,7 @@ Keep only task-local execution context here. Do not duplicate facts that are obv ## Objective Snapshot -- Add a runtime-facing capability authoring guide under `docs/agents/capabilities.md` that documents supported layouts, invalid layouts, required and optional fields, no-catalog behavior, and the projection split between brief and rich discovery. +- Add a runtime-facing capability authoring guide under `docs/rfcs/005_capability-catalogs-agent-directories.md` that documents supported layouts, invalid layouts, required and optional fields, no-catalog behavior, and the projection split between brief and rich discovery. ## Important Decisions @@ -21,7 +21,7 @@ Keep only task-local execution context here. Do not duplicate facts that are obv ## Files / Surfaces -- `docs/agents/capabilities.md` +- `docs/rfcs/005_capability-catalogs-agent-directories.md` - `docs/rfcs/001_agent-md-with-skills-memory.md` - `docs/rfcs/003_agh-network-v0.md` - `internal/config/capabilities.go` diff --git a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/test-cases/TC-FUNC-014.md b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/test-cases/TC-FUNC-014.md index e021aca7c..ddd8989bc 100644 --- a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/test-cases/TC-FUNC-014.md +++ b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/test-cases/TC-FUNC-014.md @@ -6,7 +6,7 @@ **Estimated Time:** 10 minutes **Created:** 2026-04-19 **Last Updated:** 2026-04-19 -**Module:** `docs/agents/capabilities.md`, `docs/rfcs/003_agh-network-v0.md` +**Module:** `docs/rfcs/005_capability-catalogs-agent-directories.md`, `docs/rfcs/003_agh-network-v0.md` **Traceability:** Task 05; RFC 003 capability discovery sections; TechSpec local layout and projection rules. **Execution Surfaces:** Documentation review, exact key-string comparison, example validation. **Durable Regression Anchors:** Runtime guide and RFC 003 text; package tests that assert `agh.capabilities_brief` survives payload conversion. @@ -22,7 +22,7 @@ Verify the user-visible runtime guide and RFC text still describe the shipped la ### Test Steps -1. Compare `docs/agents/capabilities.md` against the TechSpec and tasks 01-04. +1. Compare `docs/rfcs/005_capability-catalogs-agent-directories.md` against the TechSpec and tasks 01-04. - **Expected:** The guide lists all four supported local layouts, invalid mixed layouts, required fields, optional fields, basename rules, and no-catalog behavior. 2. Compare RFC 003 capability sections against the shipped wire behavior. - **Expected:** Keys and semantics exactly match `agh.capabilities_brief`, `agh.include`, `agh.capability_ids`, and `agh.capability_catalog`. @@ -33,7 +33,7 @@ Verify the user-visible runtime guide and RFC text still describe the shipped la | Field | Value | Notes | | --- | --- | --- | -| Runtime guide | `docs/agents/capabilities.md` | Local authoring source | +| Runtime guide | `docs/rfcs/005_capability-catalogs-agent-directories.md` | Local authoring source | | RFC | `docs/rfcs/003_agh-network-v0.md` | Wire contract source | | Wire keys | `agh.capabilities_brief`, `agh.include`, `agh.capability_ids`, `agh.capability_catalog` | Exact-string match required | diff --git a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/test-plans/agent-capabilities-test-plan.md b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/test-plans/agent-capabilities-test-plan.md index 0d543d1c6..621db7635 100644 --- a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/test-plans/agent-capabilities-test-plan.md +++ b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/test-plans/agent-capabilities-test-plan.md @@ -35,7 +35,7 @@ Key risks: - Session-to-network join plumbing for capability-aware local peer registration. - Brief discovery through `greet`, peer registry/listing, and API payload conversion. - Explicit rich `whois` discovery for full-catalog, filtered-catalog, no-catalog, unknown-ID, and oversized-response scenarios. -- Documentation consistency across `docs/agents/capabilities.md`, RFC 003, and the shipped runtime behavior from tasks 01-05. +- Documentation consistency across `docs/rfcs/005_capability-catalogs-agent-directories.md`, RFC 003, and the shipped runtime behavior from tasks 01-05. ### Out of Scope @@ -65,7 +65,7 @@ Key risks: | Session/runtime join | Join payload or manager evidence proving capability-aware local peer registration on create/resume, including deterministic empty slices for no-catalog peers. | | Router-level envelopes | Captured `greet` and `whois` request/response behavior showing brief metadata, explicit rich discovery, filtering, no-catalog/unknown-ID empty catalogs, and oversized-response rejection. | | API payload visibility | `internal/api/core` payloads or equivalent handler evidence showing the same brief metadata visible after runtime/router flows. | -| Documentation consistency | Fresh comparison against `docs/agents/capabilities.md` and RFC 003 so user-visible layout rules and wire keys match implementation exactly. | +| Documentation consistency | Fresh comparison against `docs/rfcs/005_capability-catalogs-agent-directories.md` and RFC 003 so user-visible layout rules and wire keys match implementation exactly. | ## Environment Requirements diff --git a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/verification-report.md b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/verification-report.md index f9dbb698e..34a496e3c 100644 --- a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/verification-report.md +++ b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/qa/verification-report.md @@ -25,7 +25,7 @@ ADDITIONAL EVIDENCE - API payload visibility evidence: - `go test ./internal/api/core -count=1 -v -run 'TestNetworkConversionHelpersPreserveMetadata'` - Documentation consistency evidence: - - `rg -n 'capabilities\\.toml|capabilities\\.json|capabilities/|agh\\.capabilities_brief|agh\\.include|agh\\.capability_ids|agh\\.capability_catalog|peer_card\\.ext' docs/agents/capabilities.md docs/rfcs/003_agh-network-v0.md internal/network internal/api/core internal/config` + - `rg -n 'capabilities\\.toml|capabilities\\.json|capabilities/|agh\\.capabilities_brief|agh\\.include|agh\\.capability_ids|agh\\.capability_catalog|peer_card\\.ext' docs/rfcs/005_capability-catalogs-agent-directories.md docs/rfcs/003_agh-network-v0.md internal/network internal/api/core internal/config` - Post-gate rerun after final `make verify`: - `go test ./internal/config -count=1 -v -run 'TestLoadAgentDefFileLoadsCapabilityCatalogAndMCPSidecar|TestLoadWorkspaceAgentDefsLoadsAgentsWithoutCapabilityCatalog'` - `go test -tags integration ./internal/session ./internal/network -count=1 -v -run 'TestManagerIntegrationCapabilityAwareJoinCarriesCatalogAcrossCreateResumeAndStop|TestManagerIntegrationCapabilityAwareJoinKeepsMissingCatalogProjectionEmpty|TestManagerJoinPublishesProjectedCapabilityBriefInInitialAndReconnectGreets|TestDirectedWhoisRichDiscoveryDeliversPeerCardAndCapabilityCatalog|TestDirectedWhoisRichDiscoveryFilteringRefreshesRemotePresence'` diff --git a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/reviews-004/issue_002.md b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/reviews-004/issue_002.md index 32c3c6d72..43a0247aa 100644 --- a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/reviews-004/issue_002.md +++ b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/reviews-004/issue_002.md @@ -19,7 +19,7 @@ source_review_submitted_at: "2026-04-20T15:11:23Z" - Decision: `invalid` - Root cause analysis: directory-mode loading intentionally accepts only regular files. `loadCapabilityCatalogDirectory(...)` checks `info.Mode().IsRegular()` before selecting `*.toml` or `*.json` entries. -- Why this is invalid: that behavior is already documented in [docs/agents/capabilities.md](/Users/pedronauck/dev/compozy/agh2/docs/agents/capabilities.md:135) as "Only regular files with the selected extension are loaded," and the existing regression test `TestLoadAgentCapabilitiesDirectoryModeLoadsSelectedRegularFilesOnly` codifies the same contract. +- Why this is invalid: that behavior is already documented in [docs/rfcs/005_capability-catalogs-agent-directories.md](/Users/pedronauck/Dev/compozy/agh/docs/rfcs/005_capability-catalogs-agent-directories.md#L145) as "Only regular files with the selected extension are loaded," and the existing regression test `TestLoadAgentCapabilitiesDirectoryModeLoadsSelectedRegularFilesOnly` codifies the same contract. - Additional reasoning: following symlinks here would broaden the filesystem trust boundary for agent catalogs rather than fixing a documented bug. The current behavior is a deliberate fail-closed loader rule, not an accidental omission. ## Resolution diff --git a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/task_05.md b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/task_05.md index 399fa98c2..470475762 100644 --- a/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/task_05.md +++ b/.compozy/tasks/_archived/1776789714756-4ddeaeed-agent-capabilities/task_05.md @@ -47,7 +47,7 @@ See TechSpec "Data Models", "Projection rules", and "Technical Considerations", - `docs/rfcs/003_agh-network-v0.md` - wire-level brief and rich capability discovery contract ### Dependent Files -- `docs/agents/capabilities.md` - proposed runtime-facing authoring guide for local capability catalogs +- `docs/rfcs/005_capability-catalogs-agent-directories.md` - proposed runtime-facing authoring guide for local capability catalogs - `docs/rfcs/001_agent-md-with-skills-memory.md` - may need small clarifications or cross-links about capability sidecars as part of a self-contained agent directory - `.compozy/tasks/agent-capabilities/task_01.md` - loader and validation rules documented here must stay consistent with the implementation task - `.compozy/tasks/agent-capabilities/task_04.md` - rich discovery docs must match the explicit `whois` behavior defined there @@ -58,7 +58,7 @@ See TechSpec "Data Models", "Projection rules", and "Technical Considerations", - [ADR-003: Soft Outcome-Oriented Capability Model](adrs/adr-003.md) - document the required/optional capability fields and semantics ## Deliverables -- Runtime-facing capability authoring guide under `docs/agents/capabilities.md` +- Runtime-facing capability authoring guide under `docs/rfcs/005_capability-catalogs-agent-directories.md` - Valid and invalid local catalog examples aligned with the shipped loader behavior - Cross-links or clarifications in existing RFC docs where needed to keep runtime and network boundaries explicit - Documentation review checklist proving field names, examples, and wire keys match the implementation **(REQUIRED)** diff --git a/.compozy/tasks/_archived/20260417-021722-site/_techspec.md b/.compozy/tasks/_archived/20260417-021722-site/_techspec.md index 19be67fc3..875b7f5ac 100644 --- a/.compozy/tasks/_archived/20260417-021722-site/_techspec.md +++ b/.compozy/tasks/_archived/20260417-021722-site/_techspec.md @@ -2,7 +2,7 @@ ## Executive Summary -Build a documentation website for AGH at `agh.compozy.com` using Fumadocs (Next.js) as the framework. The site serves two linked audiences through two content collections: **AGH Runtime** for operators and **AGH Network Protocol** for implementers. The landing page is a custom Next.js page that presents a balanced runtime + network story, but keeps **Runtime / Get Started** as the primary conversion path for cold visitors. `AGH Network` is used as the public-facing marketing label; `AGH Network Protocol` is reserved for spec and reference contexts. The site lives in `packages/site/` alongside the existing `web/` (unchanged for alpha), sharing design tokens from `packages/ui`. CLI reference is auto-generated from Cobra; API reference is deferred to Wave 2. +Build a documentation website for AGH at `agh.network` using Fumadocs (Next.js) as the framework. The site serves two linked audiences through two content collections: **AGH Runtime** for operators and **AGH Network Protocol** for implementers. The landing page is a custom Next.js page that presents a balanced runtime + network story, but keeps **Runtime / Get Started** as the primary conversion path for cold visitors. `AGH Network` is used as the public-facing marketing label; `AGH Network Protocol` is reserved for spec and reference contexts. The site lives in `packages/site/` alongside the existing `web/` (unchanged for alpha), sharing design tokens from `packages/ui`. CLI reference is auto-generated from Cobra; API reference is deferred to Wave 2. **Primary trade-off**: Fumadocs is younger than Astro Starlight or Docusaurus, with a smaller community and fewer examples. We accept this because it runs on Next.js/React — enabling component sharing from `packages/ui` and consistent DESIGN.md theming. Note: Astro Starlight can also render React components via its React integration, but the styling/theming layer still requires Astro-specific adaptation rather than native Tailwind CSS v4 preset sharing. @@ -137,7 +137,7 @@ Not applicable — the site is a static/SSG site with no API endpoints. The site ### Vercel — Deployment -- **Purpose**: Host the site at `agh.compozy.com` +- **Purpose**: Host the site at `agh.network` - **Method**: Vercel Git integration, auto-deploy on push - **Preview**: PR preview deployments for doc changes @@ -182,7 +182,7 @@ Not applicable — the site is a static/SSG site with no API endpoints. The site 3. **Implement CLI codegen** — add `doc` subcommand to cmd/agh, post-processing script, `make cli-docs` target. Depends on step 2 (output goes to site content dir). 4. **Build landing page** — implement an outcome-led hero, balanced Runtime + AGH Network split, runtime proof, network proof, named comparison, architecture proof, and runtime-first final CTA. Depends on step 2. 5. **Write Wave 1 content** — ~20 MDX pages: Overview (3), Getting Started (3), Core Concepts (3), CLI Reference (4 groups, from step 3 output), Config Reference (2), Protocol Overview + Spec v0 (adapted from RFCs). Depends on steps 2 and 3. -6. **Deploy to Vercel** — configure agh.compozy.com, Vercel project, preview deployments, static export via `next build`. Depends on steps 4 and 5. +6. **Deploy to Vercel** — configure agh.network, Vercel project, preview deployments, static export via `next build`. Depends on steps 4 and 5. 7. **(Wave 2) API reference codegen** — add `api-spec` command or hand-maintain openapi.json, configure fumadocs-openapi. Deferred — not alpha-critical. Depends on step 2. 8. **(Future) Monorepo standardization** — rename web/ → packages/app/ when the whole repo benefits. Not gated by docs launch. diff --git a/.compozy/tasks/_archived/20260417-021722-site/architecture.html b/.compozy/tasks/_archived/20260417-021722-site/architecture.html index 31c7590cc..5f4596e8e 100644 --- a/.compozy/tasks/_archived/20260417-021722-site/architecture.html +++ b/.compozy/tasks/_archived/20260417-021722-site/architecture.html @@ -675,7 +675,7 @@

AGH Website & Documentation Architecture

packages/site - Fumadocs Documentation Site · agh.compozy.com + Fumadocs Documentation Site · agh.network @@ -990,7 +990,7 @@

AGH Website & Documentation Architecture

stroke-width="1.5" /> - Vercel · agh.compozy.com + Vercel · agh.network Static hosting from out/ · PR preview deployments · Web Analytics @@ -1063,7 +1063,7 @@

Deployment & Build

  • • Static export (output: 'export') → out/
  • -
  • • Vercel at agh.compozy.com + previews
  • +
  • • Vercel at agh.network + previews
  • • turbo caches out/** per workspace
  • • Makefile: site-dev, site-build, cli-docs
  • • Redirects array from day one
  • diff --git a/.compozy/tasks/autonomous/reviews-002/_meta.md b/.compozy/tasks/autonomous/reviews-002/_meta.md new file mode 100644 index 000000000..9ef1de4c7 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/_meta.md @@ -0,0 +1,11 @@ +--- +provider: coderabbit +pr: "75" +round: 2 +created_at: 2026-04-26T19:40:13.045587Z +--- + +## Summary +- Total: 12 +- Resolved: 0 +- Unresolved: 12 diff --git a/.compozy/tasks/autonomous/reviews-002/issue_001.md b/.compozy/tasks/autonomous/reviews-002/issue_001.md new file mode 100644 index 000000000..c225be23e --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_001.md @@ -0,0 +1,75 @@ +--- +status: resolved +file: internal/agentidentity/identity.go +line: 206 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tY,comment:PRRC_kwDOR5y4QM67Yhp8 +--- + +# Issue 001: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Preserve “lookup unavailable” failures instead of always returning stale identity.** + +Line 199 currently rewrites every `lookup` error to `ErrIdentityStale`. That turns daemon/storage outages and canceled contexts into a 401/`ExitIdentityInvalid`, even though the downstream status/exit-code mapping already has a dedicated `ErrIdentityLookupUnavailable` path. Only true not-found/inactive cases should become stale; infrastructure failures should stay unavailable. + +
    +Suggested direction + +```diff + func lookupSessionSnapshot(ctx context.Context, lookup SessionLookup, creds Credentials) (SessionSnapshot, error) { + snapshot, err := lookup(ctx, creds.SessionID) + if err != nil { ++ if errors.Is(err, ErrIdentityLookupUnavailable) || ++ errors.Is(err, context.Canceled) || ++ errors.Is(err, context.DeadlineExceeded) { ++ return SessionSnapshot{}, identityError( ++ ErrIdentityLookupUnavailable, ++ "identity_lookup_unavailable", ++ "agent identity cannot be validated", ++ "retry after the daemon is reachable", ++ ) ++ } + return SessionSnapshot{}, identityError( + ErrIdentityStale, + "identity_stale", + "agent session identity is not known to the daemon", + "start or resume the AGH session, then retry", + ) + } +``` +
    + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/agentidentity/identity.go` around lines 197 - 206, When +lookupSessionSnapshot returns an error, don't always map it to ErrIdentityStale; +instead detect and preserve infrastructure/context failures by returning +ErrIdentityLookupUnavailable (or the original unavailable reason). In +lookupSessionSnapshot (which calls the SessionLookup function), check the +returned err with errors.Is for ErrIdentityLookupUnavailable and for +context.Canceled/context.DeadlineExceeded (or other transient errors) and return +identityError(ErrIdentityLookupUnavailable, ...) in those cases; only translate +genuine not-found/inactive responses to identityError(ErrIdentityStale, ...). +Use the existing symbols lookupSessionSnapshot, SessionLookup, Credentials, +ErrIdentityStale and ErrIdentityLookupUnavailable to locate and implement these +conditional error branches. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `lookupSessionSnapshot` currently maps every `SessionLookup` error to `ErrIdentityStale`, so daemon/storage lookup failures and canceled/deadline contexts are incorrectly reported as invalid identity (`401`/`ExitIdentityInvalid`). The fix is to preserve infrastructure/context lookup failures as `ErrIdentityLookupUnavailable` while keeping unknown/inactive sessions mapped to `ErrIdentityStale`. +- Resolution: Implemented unavailable-error preservation for `ErrIdentityLookupUnavailable`, `context.Canceled`, and `context.DeadlineExceeded`; verified by focused tests and full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_002.md b/.compozy/tasks/autonomous/reviews-002/issue_002.md new file mode 100644 index 000000000..1766eab22 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_002.md @@ -0,0 +1,61 @@ +--- +status: resolved +file: internal/agentidentity/identity.go +line: 285 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tZ,comment:PRRC_kwDOR5y4QM67Yhp9 +--- + +# Issue 002: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Populate `Model` when converting `session.Info` to `SessionSnapshot`.** + +`SessionSnapshotFromInfo` never copies `info.Model`, but `internal/api/core/agent_identity.go` builds `/api/agent/me` from `caller.Session.Model`. Any caller resolved through this helper will therefore report an empty model even when the backing session has one. + +
    +Suggested fix + +```diff + return SessionSnapshot{ + ID: info.ID, + Name: info.Name, + AgentName: info.AgentName, + Provider: info.Provider, ++ Model: info.Model, + WorkspaceID: info.WorkspaceID, + WorkspacePath: info.Workspace, + Channel: info.Channel, + Type: info.Type, +``` +
    + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/agentidentity/identity.go` around lines 272 - 285, +SessionSnapshotFromInfo currently omits copying info.Model into the returned +SessionSnapshot, causing callers (e.g., internal/api/core/agent_identity.go that +reads caller.Session.Model) to see an empty model; update the constructor to set +the SessionSnapshot's Model field from info.Model (or clone it if a deep copy is +required) so the returned SessionSnapshot includes the session's model +information (reference: SessionSnapshot, session.Info, info.Model, +SessionSnapshotFromInfo). +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `SessionSnapshotFromInfo` omits `info.Model` even though `/api/agent/me` surfaces `caller.Session.Model`. Sessions resolved through this helper can therefore return an empty model despite a populated daemon session record. Focused verification also showed the backing `session.Info`/store metadata path did not carry model data, so the root fix must propagate `Model` through the session/store read model as well as copy it into `SessionSnapshot.Model`. +- Resolution: Added model propagation through session/store metadata and identity snapshot conversion, then covered it through the validated identity response test and full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_003.md b/.compozy/tasks/autonomous/reviews-002/issue_003.md new file mode 100644 index 000000000..3aa46da72 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_003.md @@ -0,0 +1,45 @@ +--- +status: resolved +file: internal/api/core/agent_channels.go +line: 103 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6ta,comment:PRRC_kwDOR5y4QM67Yhp- +--- + +# Issue 003: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Validate the `channel` path before the non-wait inbox path.** + +`AgentChannelRecv` only rejects an empty string. With `wait=false`, `agentChannelInbox` reads the caller's whole inbox and filters locally, so an invalid channel identifier can come back as `200` with an empty message list instead of a validation error. That makes malformed channel IDs indistinguishable from real empty channels. + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/api/core/agent_channels.go` around lines 85 - 103, AgentChannelRecv +currently only checks for empty channel and calls agentChannelInbox even when +wait=false, allowing malformed channel IDs to return 200 with an empty list; +validate the channel path/format (not just non-empty) before calling +agentChannelInbox by adding a proper channel format validation on the trimmed +channel variable and returning a 400 (using h.respondError) for invalid +identifiers, so agentChannelInbox, agentChannelMessagesFromEnvelopes, +parseBoolQuery and parsePositiveIntQuery are only invoked when the channel has +passed validation. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `AgentChannelRecv` only rejects an empty channel path. In non-wait mode, `agentChannelInbox` fetches the whole caller inbox and filters by channel locally, so malformed channel identifiers such as `bad.channel` can return `200` with an empty message list. The fix is to validate the trimmed path with the shared network channel grammar before reading the inbox. +- Resolution: Added `network.ValidateChannel` validation before inbox reads and covered invalid channel rejection in UDS receive tests; verified by focused tests and full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_004.md b/.compozy/tasks/autonomous/reviews-002/issue_004.md new file mode 100644 index 000000000..30e22e0e7 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_004.md @@ -0,0 +1,51 @@ +--- +status: resolved +file: internal/api/core/agent_channels.go +line: 103 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tb,comment:PRRC_kwDOR5y4QM67Yhp_ +--- + +# Issue 004: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Reject malformed `wait`/`limit` values instead of defaulting them.** + +`parseBoolQuery` and `parsePositiveIntQuery` silently turn bad input into `false` and `0`. For `/api/agent/channels/{channel}/recv`, that means `?wait=maybe` disables long-polling and `?limit=abc` removes the cap instead of returning the documented invalid-request response. + + + +Also applies to: 796-816 + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/api/core/agent_channels.go` around lines 91 - 103, The handler is +currently calling parseBoolQuery and parsePositiveIntQuery which silently coerce +invalid inputs to false/0; update the recv handler that calls agentChannelInbox +(and similarly the other handler range referenced) to validate query parsing and +reject malformed values: call parseBoolQuery and parsePositiveIntQuery (or their +underlying parsing logic) in a way that returns an error on invalid input, and +if parsing fails respond with h.respondError(c, http.StatusBadRequest, +ErrInvalidRequest) (or the project’s documented invalid-request response) +instead of proceeding to agentChannelInbox; reference the handler that invokes +agentChannelInbox and the functions parseBoolQuery/parsePositiveIntQuery to +locate and fix the logic. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `parseBoolQuery` and `parsePositiveIntQuery` silently coerce malformed values to default values. `AgentChannelRecv` therefore treats `?wait=maybe` as `wait=false` and `?limit=abc` as unlimited instead of rejecting the request. The fix is to make query parsing return validation errors and have the receive handler respond with the documented bad-request path. +- Resolution: Changed query parsing to return errors for malformed `wait`/`limit` values and respond with bad-request validation errors before service access; verified by receive validation tests and full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_005.md b/.compozy/tasks/autonomous/reviews-002/issue_005.md new file mode 100644 index 000000000..2fcc55809 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_005.md @@ -0,0 +1,88 @@ +--- +status: resolved +file: internal/api/core/conversions_parsers_test.go +line: 327 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tc,comment:PRRC_kwDOR5y4QM67YhqA +--- + +# Issue 005: _⚠️ Potential issue_ | _🟡 Minor_ +## Review Comment + +_⚠️ Potential issue_ | _🟡 Minor_ + +**Assert schedule deep-copy semantics explicitly.** + +Line 325 verifies task/owner pointers are not reused, but schedule pointer reuse is not asserted. A regression in `Schedule` copy behavior would currently pass. + + +
    +Suggested test hardening + +```diff + if payload.Schedule == nil || payload.Schedule.Interval != "10m" { + t.Fatalf("schedule payload = %#v", payload.Schedule) + } ++ if payload.Schedule == &schedule { ++ t.Fatal("JobPayloadFromJob reused schedule input pointer") ++ } + if payload.Task == nil || payload.Task.Owner == nil || payload.Task.Owner.Ref != "triage" { + t.Fatalf("task payload = %#v", payload.Task) + } +``` +
    + + + +
    +📝 Committable suggestion + +> ‼️ **IMPORTANT** +> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. + +```suggestion + if payload.Schedule == nil || payload.Schedule.Interval != "10m" { + t.Fatalf("schedule payload = %#v", payload.Schedule) + } + if payload.Schedule == &schedule { + t.Fatal("JobPayloadFromJob reused schedule input pointer") + } + if payload.Task == nil || payload.Task.Owner == nil || payload.Task.Owner.Ref != "triage" { + t.Fatalf("task payload = %#v", payload.Task) + } + if payload.Task == &jobTask || payload.Task.Owner == &owner { + t.Fatal("JobPayloadFromJob reused nested input pointers") + } +``` + +
    + + + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/api/core/conversions_parsers_test.go` around lines 319 - 327, The +test fails to assert that Schedule was deep-copied; add an explicit +pointer-inequality check for payload.Schedule to ensure JobPayloadFromJob +doesn't reuse the input Schedule pointer (e.g., assert payload.Schedule != +&job.Schedule or compare to the original schedule variable), and also optionally +verify key fields (like Interval) still equal to confirm content was copied; +update the test near the existing task/owner pointer checks (which reference +payload.Task, jobTask, and owner) to include this Schedule pointer check. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestJobPayloadFromJobCopiesNestedOptionalFields` verifies the copied schedule content but not pointer independence. A regression that reuses the input `Schedule` pointer would pass while task/owner pointer reuse is already covered. The fix is to add the missing schedule pointer-inequality assertion. +- Resolution: Added the missing schedule pointer-inequality assertion and verified via focused tests plus full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_006.md b/.compozy/tasks/autonomous/reviews-002/issue_006.md new file mode 100644 index 000000000..e45d10b74 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_006.md @@ -0,0 +1,140 @@ +--- +status: resolved +file: internal/api/core/tasks_surface_integration_test.go +line: 404 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tj,comment:PRRC_kwDOR5y4QM67YhqJ +--- + +# Issue 006: _⚠️ Potential issue_ | _🟡 Minor_ +## Review Comment + +_⚠️ Potential issue_ | _🟡 Minor_ + +
    +🧩 Analysis chain + +🏁 Script executed: + +```shell +#!/bin/bash +set -euo pipefail + +if [ -f go.mod ]; then + echo "== go.mod ==" + sed -n '1,20p' go.mod | sed -n '/^go /p' +else + fd '^go\.mod$' -a -x sh -c ' + echo "== $1 ==" + sed -n "1,20p" "$1" | sed -n "/^go /p" + ' sh {} +fi +``` + +Repository: compozy/agh + +Length of output: 78 + +--- + + + +
    + +**Add assertions for HTTP response body payloads in execution routes.** + +The test asserts HTTP status codes and forwarded `taskpkg.ExecutionRequest` properties, but does not validate the response bodies returned by handlers for publish/start/approve routes. Routes like reject/read/archive/dismiss are tested only for status code with no payload verification. A handler regression that returns unexpected response body content would pass undetected. + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/api/core/tasks_surface_integration_test.go` around lines 335 - 404, +The test currently only checks status codes and the recorded executionRequests +map (via performRequest and executionRequests[tc.call]) but does not assert the +HTTP response body for publish/start/approve ("/tasks/task-1/publish", +"/tasks/task-1/start", "/tasks/task-1/approve") nor for the triage/reject +routes; update the table-driven test in Test (the loop that uses performRequest, +resp and tc) to decode/inspect resp.Body and add assertions: for calls +"publish","start","approve" assert the returned JSON matches the expected +ExecutionRequest fields (idempotency_key, network_channel, metadata) or the +handler's documented response schema, and for +"reject","read","archive","dismiss" assert either the expected empty body or +specific JSON message; use tc.call and tc.wantMetadata/tc.wantKey/tc.wantChannel +to drive the expected body checks so the assertions fail if handlers return +unexpected payloads. +``` + +
    + + + +--- + +_⚠️ Potential issue_ | _🟡 Minor_ + +**Assert the execution response payload, not just the status code.** + +`publish`, `start`, and `approve` now return a `TaskExecutionResponse`, but these subtests only validate status and the forwarded request. A handler regression that drops the `run` payload, returns the wrong shape, or points the run at the wrong task would still pass here. As per coding guidelines, `MUST test meaningful business logic, not trivial operations` and `Ensure tests verify behavior outcomes, not just function calls`. + + +
    +Suggested test hardening + +```diff + t.Run(tc.name, func(t *testing.T) { + resp := performRequest(t, fixture.Engine, http.MethodPost, tc.path, tc.body) + if resp.Code != tc.want { + t.Fatalf("%s status = %d, want %d; body=%s", tc.path, resp.Code, tc.want, resp.Body.String()) + } ++ if tc.wantKey != "" { ++ var execution contract.TaskExecutionResponse ++ testutil.DecodeJSONResponse(t, resp, &execution) ++ if execution.Task.ID != "task-1" || ++ execution.Run.TaskID != "task-1" || ++ execution.Run.ID == "" || ++ execution.Run.Status != taskpkg.TaskRunStatusQueued { ++ t.Fatalf("%s execution response = %#v", tc.path, execution) ++ } ++ } + if tc.wantKey == "" { + return + } + got, ok := executionRequests[tc.call] +``` +
    + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/api/core/tasks_surface_integration_test.go` around lines 335 - 404, +The tests currently only assert status codes and the forwarded +executionRequests; update each subtest (inside the loop that uses performRequest +and checks executionRequests[tc.call]) to also unmarshal resp.Body into a +TaskExecutionResponse and assert its fields match the expected values +(idempotency key == tc.wantKey, network_channel == tc.wantChannel, metadata == +tc.wantMetadata, and that the response's run references the correct task ID +"task-1" and/or run/task identifier expected for publish/start/approve). Keep +the existing checks against executionRequests[tc.call] but add these +response-body assertions so handlers returning wrong payload shapes or wrong run +targets will fail; reference the performRequest call, the executionRequests map +lookup, and the TaskExecutionResponse type when locating where to add the +additional assertions. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: The task mutation integration test asserts status codes and forwarded execution requests, but it does not decode the HTTP response payload for publish/start/approve or the task/triage responses for reject/read/archive/dismiss. Handler regressions that return an empty or wrong JSON body would pass. The fix is to decode and assert response bodies in each subtest. +- Resolution: Added response-body decoding/assertions for execution, task, and triage route responses; verified with the focused integration test and full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_007.md b/.compozy/tasks/autonomous/reviews-002/issue_007.md new file mode 100644 index 000000000..04b309569 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_007.md @@ -0,0 +1,48 @@ +--- +status: resolved +file: internal/api/spec/spec.go +line: 1438 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tk,comment:PRRC_kwDOR5y4QM67YhqL +--- + +# Issue 007: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Document the real 403/503 failure modes on the new agent routes.** + +The new `/api/agent/*` operations mostly advertise only `401`/`404`/`500`, but the current implementation already emits other statuses. `/api/agent/me` returns `403 Forbidden` on workspace mismatch and `503 Service Unavailable` when the session service is missing in `internal/api/udsapi/agent_identity_test.go`, and the handlers in `internal/api/core/agent_channels.go` return `503` when `AgentContextService` or the network service is unavailable. Generated clients will miss supported failure modes if the spec stays narrower than the handlers. + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/api/spec/spec.go` around lines 1212 - 1438, The OpenAPI specs for +the agent endpoints (OperationIDs like getAgentMe, getAgentContext, +listAgentChannels, receiveAgentChannelMessages, sendAgentChannelMessage, +replyAgentChannelMessage, claimNextAgentTask, heartbeatAgentTaskRun, +completeAgentTaskRun, failAgentTaskRun, releaseAgentTaskRun, spawnAgentSession, +getAgentCoordinatorConfig) omit real failure modes; update each ResponseSpec to +include 403 (e.g., "Forbidden — workspace or permission mismatch") and 503 +(e.g., "Service unavailable — dependent service missing") entries with Body: +contract.ErrorPayload{} so generated clients reflect handlers that return 403 +and 503. Ensure /api/agent/me definitely includes 403 and 503 and mirror the +same additions on channel/task/spawn/coordinator endpoints that can return +service-unavailable or permission-denied errors. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: The OpenAPI response specs for the new agent routes omit status codes the handlers already emit. Identity resolution can return `403` for workspace mismatch and `503` for unavailable session lookup; channel/context routes can return `503` when dependent services are missing; task/spawn/coordinator routes can return permission or service-unavailable failures through shared status mapping. The fix is to document `403` and `503` error payloads on the agent operations and regenerate the derived OpenAPI/client artifacts because this touches `internal/api/spec`. +- Resolution: Added the missing agent-route error responses, regenerated derived OpenAPI/TypeScript contracts, and verified with `make codegen-check`, web checks, and full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_008.md b/.compozy/tasks/autonomous/reviews-002/issue_008.md new file mode 100644 index 000000000..4830e2e91 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_008.md @@ -0,0 +1,51 @@ +--- +status: resolved +file: internal/api/udsapi/agent_identity_test.go +line: 173 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tm,comment:PRRC_kwDOR5y4QM67YhqN +--- + +# Issue 008: _🛠️ Refactor suggestion_ | _🟠 Major_ +## Review Comment + +_🛠️ Refactor suggestion_ | _🟠 Major_ + +**Use the required `t.Run("Should...")` pattern consistently.** + +The first table uses free-form subtest names, and `TestAgentMeReturnsValidatedCallerIdentity` is still a bare top-level case. Please wrap/rename these so every case follows the repository's required `Should...` subtest pattern. + + + +As per coding guidelines, `**/*_test.go`: MUST use `t.Run("Should...")` pattern for ALL test cases and `Table-driven tests with subtests (t.Run) as default.` + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/api/udsapi/agent_identity_test.go` around lines 16 - 173, The tests +violate the required "Should..." t.Run pattern: update the table-driven names in +TestAgentMeRejectsInvalidCallerIdentity so each test case's name begins with +"Should ..." and is invoked via t.Run(tt.name, func(t *testing.T) { ... }) +(preserve t.Parallel() inside each subtest), and wrap the standalone +TestAgentMeReturnsValidatedCallerIdentity body inside a t.Run("Should return +validated caller identity", func(t *testing.T) { ... }) (keeping the existing +t.Parallel and assertions); reference the test functions +TestAgentMeRejectsInvalidCallerIdentity and +TestAgentMeReturnsValidatedCallerIdentity and the table loop that calls t.Run +for the fix. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestAgentMeRejectsInvalidCallerIdentity` uses free-form table names and `TestAgentMeReturnsValidatedCallerIdentity` has assertions directly in the top-level test body. Both violate the AGH test convention that every case is a `t.Run("Should ...")` subtest. The fix is to rename table cases and wrap the standalone test body in a `Should...` subtest. +- Resolution: Renamed/wrapped the UDS agent identity tests with `Should...` subtests and verified with focused tests plus full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_009.md b/.compozy/tasks/autonomous/reviews-002/issue_009.md new file mode 100644 index 000000000..6f8ecae7d --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_009.md @@ -0,0 +1,24 @@ +--- +status: resolved +file: internal/api/udsapi/handlers_test.go +line: 501 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177143389,nitpick_hash:c1e21af4470a +review_hash: c1e21af4470a +source_review_id: "4177143389" +source_review_submitted_at: "2026-04-26T16:15:24Z" +--- + +# Issue 009: Extend the handler-binding map to cover the rest of the new agent routes. +## Review Comment + +This map only asserts bindings for `reply`, `claim-next`, and `complete`. The newly added `send`, `spawn`, `heartbeat`, `fail`, and `release` routes can still be wired to the wrong handler while `TestRegisterRoutesCoversTechSpecEndpoints` passes, because that test only checks method/path registration. + +As per coding guidelines, `Focus on critical paths: workflow execution, state management, error handling`. + +## Triage + +- Decision: `VALID` +- Notes: `TestRegisterTaskRoutesUseSharedHandlerBindings` only asserts handler binding substrings for `reply`, `claim-next`, and `complete` among the new agent routes. `send`, `spawn`, `heartbeat`, `fail`, and `release` can be registered to the wrong handler while method/path registration still passes. The fix is to extend the expected binding map for the missing agent route handlers. +- Resolution: Extended UDS route binding assertions across the new agent channel/task/spawn routes and verified with focused tests plus full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_010.md b/.compozy/tasks/autonomous/reviews-002/issue_010.md new file mode 100644 index 000000000..d5667392c --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_010.md @@ -0,0 +1,57 @@ +--- +status: resolved +file: internal/cli/agent_kernel_test.go +line: 720 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tn,comment:PRRC_kwDOR5y4QM67YhqO +--- + +# Issue 010: _⚠️ Potential issue_ | _🟡 Minor_ +## Review Comment + +_⚠️ Potential issue_ | _🟡 Minor_ + +**Use the repo’s `Should...` subtest pattern throughout this new test file.** + +Several new tests here run assertions directly in the top-level test function (`TestMeCommandJSONReturnsValidatedIdentity`, `TestMeContextCommandJSONKeepsStableSectionOrder`, `TestSpawnCommandMapsBoundedChildRequest`, etc.) instead of wrapping the case in `t.Run("Should...")`. Please move each standalone case into a named subtest and keep `t.Parallel()` inside the subtest body. + + +As per coding guidelines, `MUST use t.Run("Should...") pattern for ALL test cases`. + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/cli/agent_kernel_test.go` around lines 16 - 720, Several top-level +test functions (e.g., TestMeCommandJSONReturnsValidatedIdentity, +TestMeContextCommandJSONKeepsStableSectionOrder, +TestSpawnCommandMapsBoundedChildRequest, +TestChannelSendRejectsMissingInputsAndInvalidIdentity, +TestAgentCommandsRejectMissingIdentityBeforeAgentCalls, +TestChannelListCommandJSONReturnsVisibleChannels, +TestChannelSendPreservesCoordinationMetadataAndRejectsClaimToken, +TestChannelReplySendsOnlyMessageIDAndBodyWhenMetadataIsResolvedServerSide, +TestChannelRecvJSONLOutputEmitsOneObjectPerMessage, +TestAgentCommandsRenderHumanAndToonOutputs) contain assertions directly in the +top-level test; wrap each of these logical cases in t.Run("Should ...") subtests +(use descriptive "Should..." titles) and move any t.Parallel() calls from the +top-level function into the body of each subtest so each subtest calls +t.Parallel() at its start; ensure any per-case setup (stubClient, deps, and +client.fn assignments) is inside the subtest body so tests remain isolated and +still call the same functions (e.g., agentMeFn, agentContextFn, agentSpawnFn, +agentChannelSendFn, agentChannelRecvFn, agentChannelReplyFn) as before. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: Several new tests in `internal/cli/agent_kernel_test.go` execute assertions directly in top-level test bodies, and some table cases use names that do not start with `Should`. This violates the repository test-shape convention. The fix is to wrap each logical case in `t.Run("Should ...")` and move `t.Parallel()` into the subtest body while preserving isolated setup. +- Resolution: Wrapped/renamed the affected CLI agent kernel tests with `Should...` subtests and verified with focused tests plus full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_011.md b/.compozy/tasks/autonomous/reviews-002/issue_011.md new file mode 100644 index 000000000..e4dd0f6b5 --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_011.md @@ -0,0 +1,22 @@ +--- +status: resolved +file: internal/cli/cli_integration_test.go +line: 1667 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177143389,nitpick_hash:264964bbec52 +review_hash: 264964bbec52 +source_review_id: "4177143389" +source_review_submitted_at: "2026-04-26T16:15:24Z" +--- + +# Issue 011: Remove unnecessary loop variable capture tt := tt (Go 1.25.4 supports per-iteration scoping). +## Review Comment + +The `tt := tt` pattern is unnecessary in Go 1.22+ since loop variables are scoped per iteration. This codebase targets Go 1.25.4, so this line can be removed. + +## Triage + +- Decision: `VALID` +- Notes: The integration test still contains the pre-Go-1.22 `tt := tt` loop capture pattern. The module targets a Go version with per-iteration loop variable scoping, so the assignment is unnecessary. The fix is to remove that redundant capture without changing behavior. +- Resolution: Removed the redundant loop-variable capture and verified with focused tests plus full `make verify`. diff --git a/.compozy/tasks/autonomous/reviews-002/issue_012.md b/.compozy/tasks/autonomous/reviews-002/issue_012.md new file mode 100644 index 000000000..933d7ef6e --- /dev/null +++ b/.compozy/tasks/autonomous/reviews-002/issue_012.md @@ -0,0 +1,48 @@ +--- +status: resolved +file: internal/cli/client_test.go +line: 428 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59q6tp,comment:PRRC_kwDOR5y4QM67YhqQ +--- + +# Issue 012: _⚠️ Potential issue_ | _🟡 Minor_ +## Review Comment + +_⚠️ Potential issue_ | _🟡 Minor_ + +**Wrap these new test bodies in `t.Run("Should...")` subtests.** + +This change adds several standalone tests with direct assertions (`TestUnixSocketClientAgentMeSendsIdentityHeaders`, `TestUnixSocketClientAgentTaskErrorsRedactClaimTokens`, etc.) instead of the repo’s required `Should...` subtest pattern. Please nest each case in a named subtest and keep `t.Parallel()` inside those blocks. + + +As per coding guidelines, `MUST use t.Run("Should...") pattern for ALL test cases`. + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/cli/client_test.go` around lines 29 - 428, The top-level test +functions (e.g., TestUnixSocketClientAgentMeSendsIdentityHeaders and +TestUnixSocketClientAgentTaskErrorsRedactClaimTokens) must be converted to use +the repository's required subtest pattern: wrap each test body in a +t.Run("Should ...", func(t *testing.T) { t.Parallel(); ... }) block so the +assertions run as a named "Should..." subtest, keeping t.Parallel() inside the +subtest closure and preserving existing logic (use the same client setup and +assertions inside the new subtest); update any other standalone tests in this +file to the same pattern so all cases follow t.Run("Should..."). +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestUnixSocketClientAgentMeSendsIdentityHeaders` and `TestUnixSocketClientAgentTaskErrorsRedactClaimTokens` contain direct assertions in the top-level test body. They need `t.Run("Should ...")` wrappers to match AGH test conventions; the existing client logic and assertions can remain inside those subtests. +- Resolution: Wrapped the affected CLI client tests with `Should...` subtests and verified with focused tests plus full `make verify`. diff --git a/.compozy/tasks/qa-review/reviews-001/_meta.md b/.compozy/tasks/qa-review/reviews-001/_meta.md deleted file mode 100644 index 3fac9d3f7..000000000 --- a/.compozy/tasks/qa-review/reviews-001/_meta.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -provider: coderabbit -pr: "73" -round: 1 -created_at: 2026-04-26T03:50:11.811146Z ---- - -## Summary -- Total: 25 -- Resolved: 0 -- Unresolved: 25 diff --git a/.compozy/tasks/qa-review/reviews-001/issue_001.md b/.compozy/tasks/qa-review/reviews-001/issue_001.md deleted file mode 100644 index e2a15d4f7..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_001.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -status: resolved -file: internal/api/core/network_test.go -line: 930 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:196605105379 -review_hash: "196605105379" -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 001: Wrap these new cases in t.Run("Should...") subtests. -## Review Comment - -These scenarios are added as top-level tests, which breaks the test shape the rest of this file already follows. As per coding guidelines, `**/*_test.go`: MUST use `t.Run("Should...")` pattern for ALL test cases. - -Also applies to: 1040-1184, 1186-1285 - -## Triage - -- Decision: `valid` -- Notes: - - The three network route regressions added at lines 930, 1040, and 1186 are still standalone top-level tests in a file that otherwise uses the repo's required `t.Run("Should...")` structure for scenario cases. - - Root cause: the new coverage was added directly as new test functions instead of folding the assertions into named subtests. - - Fix plan: wrap each scenario body in a `t.Run("Should...")` subtest, move `t.Parallel()` into the subtest closure, and keep the existing assertions unchanged. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_002.md b/.compozy/tasks/qa-review/reviews-001/issue_002.md deleted file mode 100644 index 3f11a6752..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_002.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -status: resolved -file: internal/automation/dispatch.go -line: 529 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQg,comment:PRRC_kwDOR5y4QM67VX6- ---- - -# Issue 002: _⚠️ Potential issue_ | _🔴 Critical_ -## Review Comment - -_⚠️ Potential issue_ | _🔴 Critical_ - -**Don't let fire-limit rejections count against the next fire-limit window.** - -`reserveExistingRun()` now persists scheduled fire-limit hits as `RunCancelled`, but `evaluateFireLimit()` counts every run returned by `ListRuns()` regardless of status. That means the rejection itself consumes a slot, so a deferred scheduler can keep deferring forever once the window is saturated. The fire-limit query needs to exclude these cancellations or otherwise count only runs that actually entered execution. - - - -Also applies to: 539-597 - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/automation/dispatch.go` around lines 521 - 529, evaluateFireLimit() -is currently counting runs returned by ListRuns() regardless of status, so when -reserveExistingRun() persists a fire-limit hit as RunCancelled it still consumes -a slot; update evaluateFireLimit() (and any other fire-limit counting logic) to -filter out cancelled runs (e.g., exclude status RunCancelled or any -cancel-by-fire-limit marker) or only count runs that have entered execution -(started/running/completed states that should count toward the window) when -computing the fire-limit; ensure ListRuns() is called with the adjusted status -filter and that logic in reserveExistingRun()/finishRun()/fireLimitRunStatus() -remains consistent with the new exclusion so cancellations no longer decrement -available slots. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `evaluateFireLimit()` currently counts every run returned by `ListRuns()` and does not exclude fire-limit cancellations recorded as `RunCancelled` for deferred scheduled fires. - - Root cause: scheduled fire-limit rejections are persisted through `finishRun(..., RunCancelled, ...)`, but the fire-limit window logic uses raw run count instead of filtering to statuses that should consume the limit. - - Fix plan: exclude canceled fire-limit reservation runs from the window count and retry-at calculation, then add a dispatch regression showing that a canceled reserved run does not block the next window. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_003.md b/.compozy/tasks/qa-review/reviews-001/issue_003.md deleted file mode 100644 index ab46c9fe7..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_003.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -status: resolved -file: internal/automation/dispatch_test.go -line: 563 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:891a23f7f73b -review_hash: 891a23f7f73b -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 003: Use the required subtest shape for this new fire-limit case. -## Review Comment - -This new scenario is another standalone top-level test. Please move it under a table-driven parent with `t.Run("Should...")` to stay consistent with the repo’s Go test pattern. - -As per coding guidelines, "`**/*_test.go`: Table-driven tests with subtests (t.Run) as default." and "MUST use t.Run(\"Should...\") pattern for ALL test cases". - -## Triage - -- Decision: `valid` -- Notes: - - `TestDispatchScheduledReservedRunCancelsOnFireLimit` is currently a standalone top-level test rather than a named subtest scenario. - - Root cause: the new fire-limit regression was added directly to the file instead of using the repo's default subtest pattern. - - Fix plan: wrap the scenario in a `t.Run("Should...")` block while preserving the fire-limit assertions and any added regression coverage tied to issue 002. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_004.md b/.compozy/tasks/qa-review/reviews-001/issue_004.md deleted file mode 100644 index 6a9e409bf..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_004.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -status: resolved -file: internal/automation/schedule_test.go -line: 183 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:bee851f74c9c -review_hash: bee851f74c9c -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 004: Use the required subtest pattern for this new scheduler case. -## Review Comment - -This coverage was added as a standalone top-level test. Please fold it under a table-driven parent with `t.Run("Should...")` so new fire-limit scheduler scenarios can extend the same harness cleanly. - -As per coding guidelines, "`**/*_test.go`: Table-driven tests with subtests (t.Run) as default." and "MUST use t.Run(\"Should...\") pattern for ALL test cases". - -## Triage - -- Decision: `valid` -- Notes: - - `TestSchedulerDefersNextRunAfterFireLimit` is still a standalone top-level test case. - - Root cause: the new scheduler scenario was added outside the file's expected `t.Run("Should...")` structure. - - Fix plan: fold the existing assertions into a named subtest and keep the scheduler harness unchanged. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_005.md b/.compozy/tasks/qa-review/reviews-001/issue_005.md deleted file mode 100644 index a7045e4e1..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_005.md +++ /dev/null @@ -1,145 +0,0 @@ ---- -status: resolved -file: internal/network/audit.go -line: 279 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQk,comment:PRRC_kwDOR5y4QM67VX7C ---- - -# Issue 005: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -**Prune stale presence keys or this cache will grow forever.** - -`presence.lastSeen` only ever inserts/updates entries. In a long-lived daemon with peer/channel churn, every distinct greet tuple stays resident indefinitely even after it has aged past `presence.duration`, so this becomes an unbounded in-memory index. - - - -
    -Possible direction - -```diff - func (w *FileAuditWriter) shouldWriteTimelineMessage(entry store.NetworkMessageEntry) bool { - if strings.TrimSpace(entry.Kind) != string(KindGreet) { - return true - } - if w == nil || w.presence.duration <= 0 { - return true - } - - key := strings.Join([]string{ - strings.TrimSpace(entry.Direction), - strings.TrimSpace(entry.Channel), - strings.TrimSpace(entry.PeerFrom), - strings.TrimSpace(entry.PeerTo), - }, "\x00") - - at := entry.Timestamp.UTC() - w.presence.mu.Lock() - defer w.presence.mu.Unlock() - - if w.presence.lastSeen == nil { - w.presence.lastSeen = make(map[string]time.Time) - } -+ cutoff := at.Add(-w.presence.duration) -+ for existingKey, seenAt := range w.presence.lastSeen { -+ if seenAt.Before(cutoff) { -+ delete(w.presence.lastSeen, existingKey) -+ } -+ } - - lastSeen, ok := w.presence.lastSeen[key] - w.presence.lastSeen[key] = at - if !ok { - return true - } - return at.Sub(lastSeen) > w.presence.duration - } -``` -
    - - - -
    -📝 Committable suggestion - -> ‼️ **IMPORTANT** -> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. - -```suggestion -func (w *FileAuditWriter) shouldWriteTimelineMessage(entry store.NetworkMessageEntry) bool { - if strings.TrimSpace(entry.Kind) != string(KindGreet) { - return true - } - if w == nil || w.presence.duration <= 0 { - return true - } - - key := strings.Join([]string{ - strings.TrimSpace(entry.Direction), - strings.TrimSpace(entry.Channel), - strings.TrimSpace(entry.PeerFrom), - strings.TrimSpace(entry.PeerTo), - }, "\x00") - - at := entry.Timestamp.UTC() - w.presence.mu.Lock() - defer w.presence.mu.Unlock() - - if w.presence.lastSeen == nil { - w.presence.lastSeen = make(map[string]time.Time) - } - cutoff := at.Add(-w.presence.duration) - for existingKey, seenAt := range w.presence.lastSeen { - if seenAt.Before(cutoff) { - delete(w.presence.lastSeen, existingKey) - } - } - - lastSeen, ok := w.presence.lastSeen[key] - w.presence.lastSeen[key] = at - if !ok { - return true - } - return at.Sub(lastSeen) > w.presence.duration -} -``` - -
    - - - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/network/audit.go` around lines 251 - 279, shouldWriteTimelineMessage -currently only inserts/updates w.presence.lastSeen and never removes old keys, -causing an unbounded map growth; modify -FileAuditWriter.shouldWriteTimelineMessage (and/or the presence struct) to prune -stale entries from w.presence.lastSeen when holding the lock: iterate the map -and delete any entries where at.Now().UTC().Sub(t) > w.presence.duration (or -t.Add(w.presence.duration).Before(at)), and perform this pruning -opportunistically (e.g., each call that updates lastSeen or when the map exceeds -a threshold) to avoid locking cost spikes while ensuring entries older than -presence.duration are removed. Ensure you keep the existing logic that updates -lastSeen[key] = at and that pruning is protected by the same presence.mu lock. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `shouldWriteTimelineMessage()` updates `presence.lastSeen` but never removes keys that have already aged out of the configured presence window. - - Root cause: the presence suppression cache is append/update only, so long-lived daemons accumulate stale `(direction, channel, peerFrom, peerTo)` entries forever. - - Fix plan: opportunistically prune stale entries while holding `presence.mu` before updating the current key, then add a regression that asserts expired presence keys are evicted. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_006.md b/.compozy/tasks/qa-review/reviews-001/issue_006.md deleted file mode 100644 index a210c56b7..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_006.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -status: resolved -file: internal/network/audit_test.go -line: 385 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQj,comment:PRRC_kwDOR5y4QM67VX7B ---- - -# Issue 006: _⚠️ Potential issue_ | _🟡 Minor_ -## Review Comment - -_⚠️ Potential issue_ | _🟡 Minor_ - -**Wrap these new test cases in `t.Run("Should...")` subtests.** - -These additions bypass the required subtest pattern used elsewhere in the suite, so they drift from the repo’s enforced test structure. As per coding guidelines, `**/*_test.go`: MUST use t.Run("Should...") pattern for ALL test cases. - - - -Also applies to: 387-414 - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/network/audit_test.go` around lines 334 - 385, The test functions -(e.g., TestAuditWriterCoalescesRepeatedGreetHeartbeatsInTimeline and the other -test at 387-414) are not using the required t.Run("Should...") subtest pattern; -wrap each test body in a t.Run call with a descriptive "Should..." name, move -t.Parallel() inside the subtest (call t.Parallel() at the start of the t.Run -closure), and keep the existing assertions and helper calls (e.g., -NewAuditWriter, writer.now override, RecordSent, and checks against storeSink) -unchanged inside the subtest closure so the tests conform to the suite pattern. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - The greet coalescing regressions in `internal/network/audit_test.go` were added as top-level tests instead of named subtests. - - Root cause: the new scenarios bypass the file's subtest convention even though they are discrete behavior cases. - - Fix plan: wrap the affected test bodies in `t.Run("Should...")` blocks and keep their existing helper setup/assertions intact. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_007.md b/.compozy/tasks/qa-review/reviews-001/issue_007.md deleted file mode 100644 index b9cd94621..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_007.md +++ /dev/null @@ -1,92 +0,0 @@ ---- -status: resolved -file: internal/network/greet_summary.go -line: 55 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQq,comment:PRRC_kwDOR5y4QM67VX7H ---- - -# Issue 007: _⚠️ Potential issue_ | _🟡 Minor_ -## Review Comment - -_⚠️ Potential issue_ | _🟡 Minor_ - -**`+N more` can be overstated when capability entries contain blanks.** - -The current count uses raw slice length after the first non-empty capability, so empty strings still increase `N`. - - -
    -Suggested fix - -```diff - for idx, capability := range card.Capabilities { - if trimmed := strings.TrimSpace(capability); trimmed != "" { -- return trimmed, len(card.Capabilities) - idx - 1 -+ extra := 0 -+ for _, candidate := range card.Capabilities[idx+1:] { -+ if strings.TrimSpace(candidate) != "" { -+ extra++ -+ } -+ } -+ return trimmed, extra - } - } -``` -
    - - - -
    -📝 Committable suggestion - -> ‼️ **IMPORTANT** -> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. - -```suggestion - for idx, capability := range card.Capabilities { - if trimmed := strings.TrimSpace(capability); trimmed != "" { - extra := 0 - for _, candidate := range card.Capabilities[idx+1:] { - if strings.TrimSpace(candidate) != "" { - extra++ - } - } - return trimmed, extra - } - } -``` - -
    - - - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/network/greet_summary.go` around lines 52 - 55, The code in the loop -over card.Capabilities returns the first non-empty trimmed capability but -computes the remaining count using the raw slice length, which overstates N when -there are blank entries; update the return to compute remaining as the number of -non-empty (after TrimSpace) entries after the found index by iterating the tail -of card.Capabilities (or using a helper that filters empties) and counting only -trimmed != "" so the returned tuple gives the correct first non-empty capability -and the correct remaining non-empty count. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `greetCapabilityLabel()` returns the first non-empty capability label but computes the remaining count from the raw slice length, so blank capability slots still inflate the `+N more` suffix. - - Root cause: the fallback path counts tail entries structurally instead of counting only trimmed non-empty capability labels. - - Fix plan: count only non-empty trailing capabilities after the first selected label and extend greet summary coverage to include blank capability entries. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_008.md b/.compozy/tasks/qa-review/reviews-001/issue_008.md deleted file mode 100644 index b056dfdd0..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_008.md +++ /dev/null @@ -1,685 +0,0 @@ ---- -status: resolved -file: internal/session/manager_hooks.go -line: 222 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQr,comment:PRRC_kwDOR5y4QM67VX7K ---- - -# Issue 008: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -
    -🧩 Analysis chain - -🏁 Script executed: - -```shell -#!/bin/bash -# Verify where root contexts are created in internal Go runtime code. -rg -n --type=go 'context\.Background\(' internal -``` - -Repository: compozy/agh - -Length of output: 50368 - ---- - -🏁 Script executed: - -```shell -# Read the manager_hooks.go file to verify the exact code at lines 215-222 -cat -n internal/session/manager_hooks.go | sed -n '200,230p' -``` - -Repository: compozy/agh - -Length of output: 1074 - ---- - -🏁 Script executed: - -```shell -# Find all callers of postLifecycleHookContext -rg -n 'postLifecycleHookContext' internal -``` - -Repository: compozy/agh - -Length of output: 264 - ---- - -🏁 Script executed: - -```shell -# Check if manager_hooks.go is in a test context or production code -file internal/session/manager_hooks.go -head -20 internal/session/manager_hooks.go -``` - -Repository: compozy/agh - -Length of output: 503 - ---- - -🏁 Script executed: - -```shell -# Check the context around line 195 where postLifecycleHookContext is called -cat -n internal/session/manager_hooks.go | sed -n '185,210p' -``` - -Repository: compozy/agh - -Length of output: 1102 - ---- - -🏁 Script executed: - -```shell -# Check the function that calls postLifecycleHookContext to understand the complete context -cat -n internal/session/manager_hooks.go | sed -n '175,215p' -``` - -Repository: compozy/agh - -Length of output: 1697 - ---- - -🏁 Script executed: - -```shell -# Count uses of context.Background() specifically in internal/session/manager_hooks.go -rg -n 'context\.Background' internal/session/manager_hooks.go -``` - -Repository: compozy/agh - -Length of output: 121 - ---- - -🏁 Script executed: - -```shell -# Check what's at line 800 -cat -n internal/session/manager_hooks.go | sed -n '790,810p' -``` - -Repository: compozy/agh - -Length of output: 563 - ---- - -🏁 Script executed: - -```shell -# Find callers of dispatchSessionPostCreate, dispatchSessionPostStop, dispatchSessionPostResume -rg -n 'dispatchSessionPost(Create|Resume|Stop)' internal/session/manager_hooks.go | head -20 -``` - -Repository: compozy/agh - -Length of output: 320 - ---- - -🏁 Script executed: - -```shell -# Check where dispatchSessionLifecycleObservation is called from -rg -n 'dispatchSessionLifecycleObservation\|dispatchSessionPostCreate\|dispatchSessionPostStop\|dispatchSessionPostResume' internal/session/ | grep -v '\.go:.*func ' | head -30 -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Find all callers of dispatchSessionLifecycleObservation -rg -n 'dispatchSessionLifecycleObservation' internal/session/ -``` - -Repository: compozy/agh - -Length of output: 581 - ---- - -🏁 Script executed: - -```shell -# Find all callers of dispatchSessionPostCreate, dispatchSessionPostStop, dispatchSessionPostResume -rg -n '\.dispatchSessionPost(Create|Resume|Stop)\(' internal/session/ -``` - -Repository: compozy/agh - -Length of output: 306 - ---- - -🏁 Script executed: - -```shell -# Check the main manager functions that initiate these dispatch calls -rg -B 5 'dispatchSessionPostCreate\|dispatchSessionPostStop\|dispatchSessionPostResume' internal/session/manager.go | head -40 -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Check the context in manager_lifecycle.go where dispatchSessionPostStop is called -cat -n internal/session/manager_lifecycle.go | sed -n '165,185p' -``` - -Repository: compozy/agh - -Length of output: 967 - ---- - -🏁 Script executed: - -```shell -# Check the context in manager_helpers.go where dispatchSessionPostCreate and PostResume are called -cat -n internal/session/manager_helpers.go | sed -n '110,130p' -``` - -Repository: compozy/agh - -Length of output: 811 - ---- - -🏁 Script executed: - -```shell -# Check what function calls these and what context they receive -cat -n internal/session/manager_lifecycle.go | sed -n '155,180p' -``` - -Repository: compozy/agh - -Length of output: 1283 - ---- - -🏁 Script executed: - -```shell -# Look for the broader context of manager_helpers.go -cat -n internal/session/manager_helpers.go | sed -n '100,135p' -``` - -Repository: compozy/agh - -Length of output: 1421 - ---- - -🏁 Script executed: - -```shell -# Check if postLifecycleHookContext is called from any test files -rg -n 'postLifecycleHookContext' internal/ --type go | grep -i test -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Check the function signatures that call dispatchSessionPostCreate/Stop/Resume to verify ctx is always provided -cat -n internal/session/manager_helpers.go | sed -n '70,100p' -``` - -Repository: compozy/agh - -Length of output: 1119 - ---- - -🏁 Script executed: - -```shell -# Look at what calls the function that contains the dispatchSessionPost* calls -cat -n internal/session/manager_helpers.go | sed -n '50,75p' -``` - -Repository: compozy/agh - -Length of output: 992 - ---- - -🏁 Script executed: - -```shell -# Examine where the ctx is sourced from at the highest level -rg -B 10 'dispatchSessionPostCreate\|dispatchSessionPostStop' internal/session/manager_lifecycle.go | head -50 -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Check if Manager has a lifecycleCtx or similar available -rg -n 'lifecycleCtx' internal/session/manager.go | head -20 -``` - -Repository: compozy/agh - -Length of output: 243 - ---- - -🏁 Script executed: - -```shell -# Check the Manager struct definition -cat -n internal/session/manager.go | sed -n '1,100p' | grep -A 50 'type Manager struct' -``` - -Repository: compozy/agh - -Length of output: 1653 - ---- - -🏁 Script executed: - -```shell -# Verify the exact function signature and parent context availability -cat -n internal/session/manager_lifecycle.go | sed -n '140,180p' -``` - -Repository: compozy/agh - -Length of output: 1721 - ---- - -🏁 Script executed: - -```shell -# Check if the ctx parameter to dispatchSessionLifecycleObservation could be nil -# by looking at all callers -rg -B 3 'dispatchSessionLifecycleObservation' internal/session/manager_hooks.go -``` - -Repository: compozy/agh - -Length of output: 697 - ---- - -🏁 Script executed: - -```shell -# Check finalizeStopped more carefully - it has a nil check at 146-147 -# So when dispatchSessionPostStop is called at line 176, ctx should never be nil -cat -n internal/session/manager_lifecycle.go | sed -n '145,180p' -``` - -Repository: compozy/agh - -Length of output: 1630 - ---- - -🏁 Script executed: - -```shell -# Verify that activateAndWatch receives a valid context parameter -rg -B 10 'func.*activateAndWatch' internal/session/manager_helpers.go | head -20 -``` - -Repository: compozy/agh - -Length of output: 394 - ---- - -🏁 Script executed: - -```shell -# Check all entry points to activateAndWatch to see if ctx could be nil -rg -n 'activateAndWatch\(' internal/session/ -``` - -Repository: compozy/agh - -Length of output: 532 - ---- - -🏁 Script executed: - -```shell -# Check manager_start.go line 187 to see what context is passed to activateAndWatch -cat -n internal/session/manager_start.go | sed -n '175,200p' -``` - -Repository: compozy/agh - -Length of output: 732 - ---- - -🏁 Script executed: - -```shell -# Verify the complete call chain from manager_start.go -cat -n internal/session/manager_start.go | sed -n '155,195p' -``` - -Repository: compozy/agh - -Length of output: 1517 - ---- - -🏁 Script executed: - -```shell -# Search for who calls the function containing the activateAndWatch call -rg -B 15 'activateAndWatch' internal/session/manager_start.go | head -40 -``` - -Repository: compozy/agh - -Length of output: 559 - ---- - -🏁 Script executed: - -```shell -# Find the function signature that contains the activateAndWatch call -rg -B 30 'if err := m.activateAndWatch' internal/session/manager_start.go | head -50 -``` - -Repository: compozy/agh - -Length of output: 1080 - ---- - -🏁 Script executed: - -```shell -# Look for the function name and its signature -cat -n internal/session/manager_start.go | sed -n '100,150p' -``` - -Repository: compozy/agh - -Length of output: 2174 - ---- - -🏁 Script executed: - -```shell -# Check if this is the Start or Resume method and what context it receives -rg -B 5 'func.*Manager.*Start\|func.*Manager.*Resume' internal/session/manager_start.go | head -20 -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Look at the complete startSession function signature -cat -n internal/session/manager_start.go | sed -n '129,135p' -``` - -Repository: compozy/agh - -Length of output: 428 - ---- - -🏁 Script executed: - -```shell -# Find all public methods that call startSession to understand entry points -rg -n '\.startSession\(' internal/session/ -``` - -Repository: compozy/agh - -Length of output: 406 - ---- - -🏁 Script executed: - -```shell -# Check the public API methods that call startSession - Start and Resume -rg -A 5 'func.*Manager.*\(Start|Resume\)' internal/session/manager.go | head -40 -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Check the public Start and Resume methods -cat -n internal/session/manager_lifecycle.go | sed -n '15,75p' -``` - -Repository: compozy/agh - -Length of output: 2149 - ---- - -🏁 Script executed: - -```shell -# Look for public API methods that users call -rg -n 'func.*Manager.*\(' internal/session/manager.go | grep -E 'Start|Resume|Stop' | head -20 -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Check manager.go for public API signatures -cat -n internal/session/manager.go | grep -A 2 'func.*Manager.*Start\|func.*Manager.*Resume\|func.*Manager.*Stop' -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Final verification - check all places where dispatchSessionPostCreate/Stop/Resume could be called with nil -rg -B 5 'dispatchSessionPostCreate\|dispatchSessionPostStop\|dispatchSessionPostResume' internal/session/*.go | grep -A 5 'ctx\s*=' -``` - -Repository: compozy/agh - -Length of output: 37 - ---- - -🏁 Script executed: - -```shell -# Check if there are any other test files that might call these -find internal/session -name '*test.go' -exec grep -l 'dispatchSessionPost' {} \; -``` - -Repository: compozy/agh - -Length of output: 139 - ---- - -🏁 Script executed: - -```shell -# Double-check: can ctx actually be nil when postLifecycleHookContext is called? -# Check manager_lifecycle.go finalizeStopped more closely -cat -n internal/session/manager_lifecycle.go | sed -n '145,200p' -``` - -Repository: compozy/agh - -Length of output: 2441 - ---- - -🏁 Script executed: - -```shell -# Check if there are similar defensive nil checks in other production paths -rg -B 3 'if ctx == nil' internal/session/*.go | grep -v test | head -40 -``` - -Repository: compozy/agh - -Length of output: 1953 - ---- - -🏁 Script executed: - -```shell -# Examine the warnHookDispatch function at line 799-801 which also uses context.Background() -cat -n internal/session/manager_hooks.go | sed -n '795,810p' -``` - -Repository: compozy/agh - -Length of output: 494 - ---- - -🏁 Script executed: - -```shell -# Final comprehensive check - verify actual nil context flow -# Check if finalizeStopped's context setup (line 146-147) guarantees non-nil ctx at line 176 -cat -n internal/session/manager_lifecycle.go | sed -n '145,182p' -``` - -Repository: compozy/agh - -Length of output: 1675 - ---- - -🏁 Script executed: - -```shell -# Check if there are test cases that might pass nil context -grep -n 'postLifecycleHookContext\|dispatchSessionPostStop\|dispatchSessionPostCreate' internal/session/manager_hooks_test.go | head -20 -``` - -Repository: compozy/agh - -Length of output: 671 - ---- - -🏁 Script executed: - -```shell -# Verify the pattern: does finalizeStopped ALWAYS ensure ctx is not nil before line 176? -rg -A 35 'func.*finalizeStopped' internal/session/manager_lifecycle.go | head -50 -``` - -Repository: compozy/agh - -Length of output: 1378 - ---- - - - -
    - -**Avoid creating root contexts in internal production code; use Manager's lifecycleCtx or propagate non-nil context.** - -Line 217 creates `context.Background()` in an internal production path, violating the guideline to avoid root context creation outside `main` and tests. While `finalizeStopped` defensively ensures non-nil context before line 176, the design pattern should follow the coding guideline consistently. Either propagate the context from the caller or use the Manager's `lifecycleCtx` (available at line 86) as a fallback instead of creating a new root context. This also affects `warnHookDispatch` at line 800 with the same pattern. - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/session/manager_hooks.go` around lines 215 - 222, -postLifecycleHookContext creates a root context with context.Background() when -ctx is nil; instead, avoid creating root contexts — use the Manager's -lifecycleCtx as the fallback or propagate a non-nil context from the caller. -Modify postLifecycleHookContext to use session.Manager.lifecycleCtx (or accept a -Manager/lifecycleCtx parameter) when ctx == nil and remove the -context.Background() creation, and mirror the same change in warnHookDispatch so -neither function ever constructs a root context; keep using -context.WithoutCancel only on non-root contexts as appropriate and ensure -finalizeStopped still guarantees callers supply or can derive a non-nil context. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `postLifecycleHookContext()` and `warnHookDispatch()` still fall back to `context.Background()` even though `Manager` already owns `lifecycleCtx` for long-lived internal work. - - Root cause: the helper paths special-case `nil` by creating a root context instead of reusing manager-owned lifecycle context and only stripping cancellation from caller-provided request contexts. - - Fix plan: route nil hook-dispatch contexts through `m.lifecycleCtx`, keep `context.WithoutCancel()` for non-nil caller contexts, and add a focused regression in `internal/session/manager_hooks_test.go` because no scoped test currently exercises these helper fallbacks. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_009.md b/.compozy/tasks/qa-review/reviews-001/issue_009.md deleted file mode 100644 index 4982d8bd0..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_009.md +++ /dev/null @@ -1,110 +0,0 @@ ---- -status: resolved -file: internal/store/globaldb/global_db_task_aux.go -line: 735 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQt,comment:PRRC_kwDOR5y4QM67VX7M ---- - -# Issue 009: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -**Avoid scanning full run history while holding the SQLite write lock.** - -This path now loads all runs for the task inside `BEGIN IMMEDIATE` just to answer “is there any open run?”. On tasks with long run history, every enqueue becomes a full read/allocation before insert, which lengthens the writer lock and can stall unrelated writes. A targeted `EXISTS`/`LIMIT 1` query on non-terminal statuses keeps the guard cheap and preserves the run ID for the error message. - - -
    -♻️ Suggested shape - -```diff -- existingRuns, err := g.listTaskRunsWithExecutor(ctx, exec, taskpkg.RunQuery{TaskID: taskRecord.ID}) -- if err != nil { -- return taskpkg.Task{}, taskpkg.Run{}, false, err -- } -- if err := validateNoOpenRunForQueuedRunReservation(taskRecord, existingRuns); err != nil { -+ openRunID, err := g.findOpenRunIDForQueuedRunReservation(ctx, exec, taskRecord.ID) -+ if err != nil { -+ return taskpkg.Task{}, taskpkg.Run{}, false, err -+ } -+ if openRunID != "" { -+ return taskpkg.Task{}, taskpkg.Run{}, false, fmt.Errorf( -+ "%w: task %q has open run %q; finish or cancel it before enqueueing another run", -+ taskpkg.ErrInvalidStatusTransition, -+ taskRecord.ID, -+ openRunID, -+ ) -+ } -- return taskpkg.Task{}, taskpkg.Run{}, false, err -- } -``` - -```go -func (g *GlobalDB) findOpenRunIDForQueuedRunReservation( - ctx context.Context, - exec taskSQLExecutor, - taskID string, -) (string, error) { - row := exec.QueryRowContext( - ctx, - `SELECT id - FROM task_runs - WHERE task_id = ? - AND status NOT IN (?, ?, ?) - ORDER BY queued_at DESC, id DESC - LIMIT 1`, - taskID, - string(taskpkg.TaskRunStatusCompleted), - string(taskpkg.TaskRunStatusFailed), - string(taskpkg.TaskRunStatusCanceled), - ) - - var runID string - if err := row.Scan(&runID); err != nil { - if errors.Is(err, sql.ErrNoRows) { - return "", nil - } - return "", fmt.Errorf("store: lookup open task run for %q: %w", taskID, err) - } - return runID, nil -} -``` -
    - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/store/globaldb/global_db_task_aux.go` around lines 729 - 735, The -current code calls listTaskRunsWithExecutor and -validateNoOpenRunForQueuedRunReservation while holding the SQLite write lock, -which loads the full run history; instead add a new helper -findOpenRunIDForQueuedRunReservation(ctx, exec, taskID) that runs a targeted -SELECT id ... WHERE task_id = ? AND status NOT IN (...) ORDER BY queued_at DESC, -id DESC LIMIT 1 (using exec.QueryRowContext) and returns the run ID or empty -string, and then replace the existing call to -listTaskRunsWithExecutor/validateNoOpenRunForQueuedRunReservation with a call to -findOpenRunIDForQueuedRunReservation; if it returns a non-empty run ID, pass -that ID into a lightweight validation branch (or adapt -validateNoOpenRunForQueuedRunReservation to accept a run ID instead of a full -slice) so you preserve the run ID for the error message while avoiding scanning -all runs inside BEGIN IMMEDIATE. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `reserveQueuedRunWithExecutor()` currently calls `listTaskRunsWithExecutor()` inside the `BEGIN IMMEDIATE` path just to decide whether any non-terminal run exists. - - Root cause: open-run validation is implemented as a full run-history scan instead of a targeted existence lookup, which unnecessarily lengthens the SQLite writer lock. - - Fix plan: replace the full scan with a `LIMIT 1` helper that returns the newest non-terminal run id, preserve the current error message, and reuse the expanded open-run regression coverage in `global_db_task_test.go`. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_010.md b/.compozy/tasks/qa-review/reviews-001/issue_010.md deleted file mode 100644 index 0a6181260..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_010.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -status: resolved -file: internal/store/globaldb/global_db_task_test.go -line: 725 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:86ff1de48631 -review_hash: 86ff1de48631 -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 010: Cover the other persisted open-run states here as subtests. -## Review Comment - -This validates the queued case against SQLite, but the storage guard rejects *any* non-terminal stored status. Adding claimed/running/starting variants here would catch SQL-level status normalization regressions that the in-memory manager tests would miss. - -As per coding guidelines, "Use table-driven tests with subtests (`t.Run`) as default pattern" and "Focus on critical paths: workflow execution, state management, error handling." - -## Triage - -- Decision: `valid` -- Notes: - - The current SQLite regression only exercises the queued open-run case, but the storage guard rejects any non-terminal stored status. - - Root cause: coverage does not currently pin claimed, starting, or running rows, so a future SQL/status normalization change could regress the guard without failing this test. - - Fix plan: convert the open-run guard test into table-driven subtests for queued, claimed, starting, and running statuses while preserving the idempotency assertions. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_011.md b/.compozy/tasks/qa-review/reviews-001/issue_011.md deleted file mode 100644 index 5ee39c8df..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_011.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -status: resolved -file: internal/store/schema_test.go -line: 65 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQu,comment:PRRC_kwDOR5y4QM67VX7N ---- - -# Issue 011: _🛠️ Refactor suggestion_ | _🟠 Major_ -## Review Comment - -_🛠️ Refactor suggestion_ | _🟠 Major_ - -**Use the repo’s required `t.Run("Should...")` structure for these new scenarios.** - -Both added migration-table cases are introduced as top-level tests instead of the required subtest pattern. As per coding guidelines, `**/*_test.go`: MUST use t.Run("Should...") pattern for ALL test cases. - - - -Also applies to: 116-117 - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/store/schema_test.go` around lines 64 - 65, The two new test cases -were added as top-level tests instead of using the repo’s required -t.Run("Should...") subtest pattern; update -TestRunMigrationsUsesIndependentMigrationTables (and the other top-level cases -around the 116-117 area) to wrap each scenario in t.Run("Should ...", func(t -*testing.T) { ... }) subtests (preserve t.Parallel where appropriate inside each -subtest) so every test case follows the "Should..." naming convention and uses -t.Run rather than being standalone top-level tests. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - The migration-table namespace regressions were added as standalone top-level tests instead of named subtests. - - Root cause: the new scenarios were appended as separate test functions instead of being wrapped in the repo's required `t.Run("Should...")` structure. - - Fix plan: wrap the affected migration scenarios in named subtests and keep their current migration assertions unchanged. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_012.md b/.compozy/tasks/qa-review/reviews-001/issue_012.md deleted file mode 100644 index c46912c51..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_012.md +++ /dev/null @@ -1,51 +0,0 @@ ---- -status: resolved -file: internal/store/sessiondb/session_db.go -line: 760 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQy,comment:PRRC_kwDOR5y4QM67VX7Q ---- - -# Issue 012: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -**Don’t make `OpenSessionDB` fail on best-effort compaction.** - -`VACUUM` is maintenance work, but it now sits in the critical open path. A transient failure here (`database is locked`, low disk space, etc.) will prevent the session DB from opening even though the schema/data are already usable after `RunMigrations` succeeds. - - - -Also applies to: 792-804 - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/store/sessiondb/session_db.go` around lines 756 - 760, OpenSessionDB -currently runs vacuumSessionSQLite inside the store.OpenSQLiteDatabase critical -path and returns its error, causing open to fail on transient VACUUM issues; -change the logic so that after RunMigrations(ctx, db, sessionSchemaMigrations) -succeeds you call vacuumSessionSQLite(ctx, db) but do not propagate its -error—log the error (with context) and continue returning nil so the DB opens -even if vacuum fails. Apply the same change for the other similar block that -calls vacuumSessionSQLite (the one around the 792-804 region) so both -OpenSQLiteDatabase callbacks never fail due to vacuumSessionSQLite errors. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `openSessionSQLite()` currently propagates `vacuumSessionSQLite()` failures after successful migrations, so best-effort compaction can block a usable session database from opening. - - Root cause: the maintenance step runs inside the open callback as a hard failure path instead of as non-blocking cleanup. - - Fix plan: keep migrations as the blocking gate, downgrade vacuum failures to logged warnings, and add a focused regression using an injected vacuum function so the open path can be tested without depending on a flaky SQLite lock. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_013.md b/.compozy/tasks/qa-review/reviews-001/issue_013.md deleted file mode 100644 index 023d696f0..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_013.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -status: resolved -file: internal/store/sessiondb/session_db_test.go -line: 93 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQv,comment:PRRC_kwDOR5y4QM67VX7O ---- - -# Issue 013: _🛠️ Refactor suggestion_ | _🟠 Major_ -## Review Comment - -_🛠️ Refactor suggestion_ | _🟠 Major_ - -**Move this new regression case under a `t.Run("Should...")` subtest.** - -This scenario was added as a standalone test, but the repository test rules require the `t.Run("Should...")` pattern for test cases. As per coding guidelines, `**/*_test.go`: MUST use t.Run("Should...") pattern for ALL test cases. - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/store/sessiondb/session_db_test.go` around lines 92 - 93, The test -function TestOpenSessionDBStripsCanonicalRawPayloadsAndVacuumsOldRows must be -converted into a subtest using t.Run("Should ...") to comply with test -conventions: wrap the existing body of -TestOpenSessionDBStripsCanonicalRawPayloadsAndVacuumsOldRows inside t.Run with a -descriptive "Should ..." name and move t.Parallel() into the subtest (or keep it -at the top-level if desired) so the test logic and assertions remain identical -but execute as a properly named subtest; update any references to local -variables accordingly within the t.Run closure. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `TestOpenSessionDBStripsCanonicalRawPayloadsAndVacuumsOldRows` is a standalone top-level regression instead of a named subtest. - - Root cause: the new migration/vacuum scenario was added directly rather than under the repo's default `t.Run("Should...")` pattern. - - Fix plan: wrap the existing body in a named subtest and colocate any new vacuum-failure regression under the same parent test file. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_014.md b/.compozy/tasks/qa-review/reviews-001/issue_014.md deleted file mode 100644 index 88110f6f8..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_014.md +++ /dev/null @@ -1,51 +0,0 @@ ---- -status: resolved -file: internal/transcript/transcript_test.go -line: 680 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQz,comment:PRRC_kwDOR5y4QM67VX7S ---- - -# Issue 014: _🛠️ Refactor suggestion_ | _🟠 Major_ -## Review Comment - -_🛠️ Refactor suggestion_ | _🟠 Major_ - -**Wrap these scenarios in `t.Run("Should...")` subtests.** - -These updated cases are still declared as standalone tests, but the repo test rules require the `t.Run("Should...")` pattern for each scenario. As per coding guidelines, `**/*_test.go`: MUST use t.Run("Should...") pattern for ALL test cases. - - - -Also applies to: 754-755 - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@internal/transcript/transcript_test.go` around lines 679 - 680, The test -function TestMarshalAgentEventExtractsToolResultShapeWithoutPersistingRaw -currently contains standalone scenarios; wrap each scenario in a -t.Run("Should...") subtest (e.g., t.Run("Should extract tool result shape -without persisting raw", func(t *testing.T){...})) so it follows the repo rule; -apply the same change to the other failing test cases referenced in the file -(the other test functions that currently have standalone scenarios around those -lines) by converting their scenario blocks into t.Run("Should...") subtests with -descriptive names and moving assertions inside each subtest closure. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - The new transcript serialization regressions are still declared as standalone top-level tests. - - Root cause: scenario-style tests were added without the file's required `t.Run("Should...")` wrapper structure. - - Fix plan: wrap the tool-result extraction and adjacent round-trip assertions in named subtests without changing the canonical transcript expectations. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_015.md b/.compozy/tasks/qa-review/reviews-001/issue_015.md deleted file mode 100644 index c88c69d9d..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_015.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -status: resolved -file: web/src/components/app-sidebar.test.tsx -line: 299 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:07f62ac5d6ff -review_hash: 07f62ac5d6ff -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 015: Always restore console spies, even if the test fails early. -## Review Comment - -`mockRestore()` is currently happy-path only. If an assertion throws first, the console mocks can leak into later tests. Use `try/finally`. - -## Triage - -- Decision: `valid` -- Notes: - - The console spies in the "opens an agent group..." regression are restored only on the happy path at the end of the test body. - - Root cause: if an assertion throws before the final lines, the mocked console methods leak into later tests. - - Fix plan: use `try/finally` around the render/assertion path so both spies are always restored. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_016.md b/.compozy/tasks/qa-review/reviews-001/issue_016.md deleted file mode 100644 index 90aa020d5..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_016.md +++ /dev/null @@ -1,95 +0,0 @@ ---- -status: resolved -file: web/src/components/app-sidebar.tsx -line: 241 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQ2,comment:PRRC_kwDOR5y4QM67VX7V ---- - -# Issue 016: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -
    -🧩 Analysis chain - -🏁 Script executed: - -```shell -#!/bin/bash -# Verify derived-state effect usage in AgentItem. -rg -n -C3 'shouldAutoOpen|useEffect\(|onOpenChange=' web/src/components/app-sidebar.tsx -``` - -Repository: compozy/agh - -Length of output: 641 - ---- - - - -
    - -**Remove `useEffect` for derived `open` state; use computed value with user override instead.** - -The `shouldAutoOpen` prop-derived value is synced into local state via `useEffect`, violating the rule that effects are only for external system sync. Instead, compute `open` directly and maintain only a user override state: - -
    -Refactor - -```diff --import { useEffect, useState } from "react"; -+import { useState } from "react"; -@@ -- const shouldAutoOpen = count > 0 || showPendingSessionRow; -- const [open, setOpen] = useState(shouldAutoOpen); -- -- useEffect(() => { -- if (shouldAutoOpen) { -- setOpen(true); -- } -- }, [shouldAutoOpen]); -+ const shouldAutoOpen = count > 0 || showPendingSessionRow; -+ const [userOpenOverride, setUserOpenOverride] = useState(null); -+ const open = userOpenOverride ?? shouldAutoOpen; -@@ -- -+ -``` -
    - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@web/src/components/app-sidebar.tsx` around lines 231 - 241, The component -currently copies the derived boolean shouldAutoOpen into local state via -useEffect; remove that effect and instead keep only a user override state (e.g., -replace useState(shouldAutoOpen) with const [openOverride, setOpenOverride] = -useState(null) or similar), compute the actual open value as -const open = openOverride !== null ? openOverride : shouldAutoOpen, remove the -useEffect block that sets open from shouldAutoOpen, and pass the computed open -and the setter (setOpenOverride or a wrapper that toggles the override) into - so the component uses the derived -shouldAutoOpen unless the user has explicitly overridden it; reference symbols: -shouldAutoOpen, open, setOpen (replace with openOverride/setOpenOverride), -useState, useEffect, Collapsible. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `AgentItem` copies the derived `shouldAutoOpen` value into local state and re-synchronizes it with `useEffect`. - - Root cause: the component is using an effect for derived UI state rather than computing the open state from props plus an explicit user override. - - Fix plan: replace the mirrored state with a nullable user override, compute `open` during render, remove the effect, and keep the existing regression around late-arriving sessions green. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_017.md b/.compozy/tasks/qa-review/reviews-001/issue_017.md deleted file mode 100644 index 7d7b56c95..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_017.md +++ /dev/null @@ -1,109 +0,0 @@ ---- -status: resolved -file: web/src/hooks/routes/use-app-layout.ts -line: 65 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQ3,comment:PRRC_kwDOR5y4QM67VX7W ---- - -# Issue 017: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -**Align `agentsLoading`/`agentsError` with the active agent source.** - -When workspace-scoped agents exist, the hook still propagates global `useAgents` loading/error. That can show loading/error states even though `agents` is already resolved from workspace detail. - - -
    -💡 Proposed fix - -```diff - const activeWorkspaceDetail = useWorkspace(activeWorkspaceId ?? "", { - enabled: activeWorkspaceId !== null, - }); -- const workspaceAgents = activeWorkspaceDetail.data?.agents ?? agents; -+ const hasWorkspaceScopedAgents = -+ activeWorkspaceId !== null && activeWorkspaceDetail.data?.agents !== undefined; -+ const workspaceAgents = hasWorkspaceScopedAgents -+ ? activeWorkspaceDetail.data?.agents -+ : agents; -@@ -- agentsLoading: agentsLoading || (activeWorkspaceId !== null && activeWorkspaceDetail.isLoading), -- agentsError: -- agentsError || -- (activeWorkspaceId !== null && -- activeWorkspaceDetail.isError && -- workspaceAgents === undefined), -+ agentsLoading: hasWorkspaceScopedAgents -+ ? activeWorkspaceDetail.isLoading -+ : agentsLoading || (activeWorkspaceId !== null && activeWorkspaceDetail.isLoading), -+ agentsError: hasWorkspaceScopedAgents -+ ? activeWorkspaceDetail.isError -+ : agentsError || (activeWorkspaceId !== null && activeWorkspaceDetail.isError), -``` -
    - - - -
    -📝 Committable suggestion - -> ‼️ **IMPORTANT** -> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. - -```suggestion - const activeWorkspaceDetail = useWorkspace(activeWorkspaceId ?? "", { - enabled: activeWorkspaceId !== null, - }); - const hasWorkspaceScopedAgents = - activeWorkspaceId !== null && activeWorkspaceDetail.data?.agents !== undefined; - const workspaceAgents = hasWorkspaceScopedAgents - ? activeWorkspaceDetail.data?.agents - : agents; - - agents: workspaceAgents, - agentsLoading: hasWorkspaceScopedAgents - ? activeWorkspaceDetail.isLoading - : agentsLoading || (activeWorkspaceId !== null && activeWorkspaceDetail.isLoading), - agentsError: hasWorkspaceScopedAgents - ? activeWorkspaceDetail.isError - : agentsError || (activeWorkspaceId !== null && activeWorkspaceDetail.isError), -``` - -
    - - - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@web/src/hooks/routes/use-app-layout.ts` around lines 59 - 65, The hook -currently mixes global useAgents state with workspace-scoped agents causing -false loading/error states; update the agentsLoading and agentsError expressions -to prefer workspaceAgents when present: set agentsLoading to true only if -workspaceAgents === undefined and (agentsLoading || (activeWorkspaceId !== null -&& activeWorkspaceDetail.isLoading)), and set agentsError only if -workspaceAgents === undefined and (agentsError || (activeWorkspaceId !== null && -activeWorkspaceDetail.isError)); this ensures workspaceAgents (from -activeWorkspaceDetail) short-circuits global useAgents loading/error -propagation. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `useAppLayout()` still forwards global `useAgents()` loading/error state even when workspace-scoped agents are already available from `useWorkspace()`. - - Root cause: `workspaceAgents` is selected from workspace detail data, but `agentsLoading`/`agentsError` still derive from a mixed global-plus-workspace expression. - - Fix plan: short-circuit loading/error to the active source of truth and add focused coverage in `web/src/hooks/routes/use-app-layout.test.tsx` because that hook already owns the surrounding behavior tests. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_018.md b/.compozy/tasks/qa-review/reviews-001/issue_018.md deleted file mode 100644 index 7161420ea..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_018.md +++ /dev/null @@ -1,49 +0,0 @@ ---- -status: resolved -file: web/src/hooks/routes/use-knowledge-page.ts -line: 8 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQ4,comment:PRRC_kwDOR5y4QM67VX7X ---- - -# Issue 018: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -**Avoid cross-system imports from `@/systems/knowledge/lib/*` in route hooks.** - -This hook should consume knowledge APIs via the knowledge public barrel, not internals, to preserve module boundaries and avoid tight coupling to internal file layout. - -As per coding guidelines, "Cross-system imports MUST only go through the public barrel (`@/systems/`). Never reach into another system's internals". - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@web/src/hooks/routes/use-knowledge-page.ts` around lines 4 - 8, The hook -imports internal symbols knowledgeMemoryKey, filterKnowledgeMemories, and -sortKnowledgeMemories directly from "@/systems/knowledge/lib/*"; update the -imports to consume these APIs from the knowledge public barrel (import from -"@/systems/knowledge") instead to respect module boundaries, or if those symbols -are not yet exported from the public barrel, add/export them (or small wrapper -functions) from the public barrel so use-knowledge-page.ts can import -knowledgeMemoryKey, filterKnowledgeMemories, and sortKnowledgeMemories via -"@/systems/knowledge" instead of reaching into internal lib files. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `use-knowledge-page.ts` currently reaches into `@/systems/knowledge/lib/*` instead of consuming helpers through the knowledge system's public barrel. - - Root cause: `knowledgeMemoryKey`, `filterKnowledgeMemories`, and `sortKnowledgeMemories` are not exported from `web/src/systems/knowledge/index.ts`, so the route hook bypasses the system boundary. - - Fix plan: export the needed helpers from the knowledge barrel and switch the hook imports to `@/systems/knowledge`, then keep the hook tests as the regression surface. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_019.md b/.compozy/tasks/qa-review/reviews-001/issue_019.md deleted file mode 100644 index a26cba5e5..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_019.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -status: resolved -file: web/src/hooks/routes/use-knowledge-page.ts -line: 18 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:c8a154181f5b -review_hash: c8a154181f5b -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 019: Don’t overwrite incoming memory.key during decoration. -## Review Comment - -Always re-synthesizing the key risks divergence from backend/canonical identity. Prefer preserving provided keys and only backfilling when absent. - -## Triage - -- Decision: `valid` -- Notes: - - `decorateKnowledgeMemories()` always rewrites `memory.key` as `${scope}:${filename}` even when a canonical key is already present. - - Root cause: decoration currently treats `key` as derived-only instead of preserving backend-provided identity and only backfilling when absent. - - Fix plan: preserve `memory.key` if supplied, synthesize only missing keys, and add a hook regression that asserts the existing key survives decoration. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_020.md b/.compozy/tasks/qa-review/reviews-001/issue_020.md deleted file mode 100644 index 88b49843d..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_020.md +++ /dev/null @@ -1,49 +0,0 @@ ---- -status: resolved -file: web/src/hooks/routes/use-knowledge-page.ts -line: 113 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQ5,comment:PRRC_kwDOR5y4QM67VX7Y ---- - -# Issue 020: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -**This `useEffect` is handling local state orchestration and should be removed.** - -The reset behavior should be handled in explicit event paths (tab/search/selection mutations) rather than effect-driven local bookkeeping. - -As per coding guidelines, "`useEffect` is an escape hatch — only for external system sync; never for derived state or event responses". - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@web/src/hooks/routes/use-knowledge-page.ts` around lines 101 - 113, Remove -the effect block that watches deleteTargetKey/isDeletePending and instead clear -the delete state in the explicit event handlers; delete the useEffect that -references deleteTargetKey, isDeletePending, resetDeleteMutation, -setDeleteTargetKey, selectedMemory and knowledgeMemoryKey, and call -resetDeleteMutation() and setDeleteTargetKey(null) directly from the tab switch -handler, search-reset handler, and any selection/mutation handlers (where -selectedMemory changes or a memory is removed) so the reset happens as part of -those event paths rather than via an effect. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - The delete-target cleanup in `useKnowledgePage()` is implemented with a `useEffect` that watches local state and mutation state. - - Root cause: local UI bookkeeping is being expressed as an effect instead of explicit event-path transitions for tab changes, search changes, selection changes, and successful deletes. - - Fix plan: remove the effect, reset delete state from the relevant event handlers, and extend the existing knowledge hook/route tests that already exercise failed-delete recovery. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_021.md b/.compozy/tasks/qa-review/reviews-001/issue_021.md deleted file mode 100644 index 6220752f6..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_021.md +++ /dev/null @@ -1,84 +0,0 @@ ---- -status: resolved -file: web/src/hooks/routes/use-network-page.ts -line: 498 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQ6,comment:PRRC_kwDOR5y4QM67VX7Z ---- - -# Issue 021: _⚠️ Potential issue_ | _🟡 Minor_ -## Review Comment - -_⚠️ Potential issue_ | _🟡 Minor_ - -**Sum `presence_count` here instead of counting grouped greet rows.** - -The API now returns collapsed presence episodes, so one `"greet"` item can represent multiple heartbeats. `filter(...).length` reports episode count, not actual presence count, and underreports peers with repeated greets. - -
    -Suggested fix - -```diff -- presenceCount: showPresence -- ? rawMessages.filter(message => message.kind === "greet").length -- : 0, -+ presenceCount: showPresence -+ ? rawMessages.reduce( -+ (total, message) => -+ message.kind === "greet" ? total + (message.presence_count ?? 1) : total, -+ 0 -+ ) -+ : 0, -``` -
    - - - -
    -📝 Committable suggestion - -> ‼️ **IMPORTANT** -> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. - -```suggestion - presenceCount: showPresence - ? rawMessages.reduce( - (total, message) => - message.kind === "greet" ? total + (message.presence_count ?? 1) : total, - 0 - ) - : 0, -``` - -
    - - - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@web/src/hooks/routes/use-network-page.ts` around lines 496 - 498, The current -presenceCount calculation counts greet rows instead of summing presence -episodes; update the presenceCount logic in the rawMessages handling (the code -that uses rawMessages.filter(message => message.kind === "greet").length) to sum -each greet item's presence_count field (e.g., use a reduce over rawMessages that -adds Number(message.presence_count) when message.kind === "greet", falling back -to 0 if missing) while preserving the showPresence conditional. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `makePeerActiveRoom()` computes `presenceCount` by counting greet rows, but peer timelines now receive collapsed greet episodes where one row can represent multiple heartbeats. - - Root cause: the hook ignores `message.presence_count` and therefore underreports presence totals whenever repeated greets were coalesced upstream. - - Fix plan: sum `presence_count ?? 1` across greet rows and add a route-level regression that asserts the rendered Presence toggle shows the aggregated total. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_022.md b/.compozy/tasks/qa-review/reviews-001/issue_022.md deleted file mode 100644 index 60714f9fa..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_022.md +++ /dev/null @@ -1,48 +0,0 @@ ---- -status: resolved -file: web/src/styles.css -line: 20 -author: coderabbitai[bot] -provider_ref: thread:PRRT_kwDOR5y4QM59oaQ8,comment:PRRC_kwDOR5y4QM67VX7b ---- - -# Issue 022: _⚠️ Potential issue_ | _🟠 Major_ -## Review Comment - -_⚠️ Potential issue_ | _🟠 Major_ - -**Use design-system motion tokens instead of hardcoded transition values.** - -Line 18 and Line 19 hardcode `140ms` and `ease-out`. Please source these from `DESIGN.md`/tokenized CSS variables so motion stays consistent across the app. - -As per coding guidelines, `web/src/**/*.{tsx,ts,css}`: Pull every color, font, radius, spacing step, and motion value from `DESIGN.md` in the repo root — never invent tokens. - -
    -🤖 Prompt for AI Agents - -``` -Verify each finding against the current code and only fix it if needed. - -In `@web/src/styles.css` around lines 16 - 20, The view-transition pseudo-elements -(::view-transition-old(root) and ::view-transition-new(root)) currently hardcode -animation-duration: 140ms and animation-timing-function: ease-out; replace those -literals with the design-system motion tokens defined in DESIGN.md (use the -repo's CSS token variables for duration and easing, e.g. the appropriate ---motion-duration-... and --motion-ease-... tokens) by updating the -animation-duration and animation-timing-function properties to reference those -CSS variables so motion values are centralized and consistent across the app. -``` - -
    - - - - - -## Triage - -- Decision: `valid` -- Notes: - - `web/src/styles.css` hardcodes `140ms` and `ease-out` for root view transitions even though the design system already defines shared duration/easing tokens. - - Root cause: the stylesheet bypasses `packages/ui/src/tokens.css` for motion values, creating an app-local divergence from `DESIGN.md`. - - Fix plan: switch the transition declarations to the shared CSS variables and add a small source regression in `web/src/storybook/web-storybook-stories-and-fixtures.test.tsx` because no scoped runtime test currently covers stylesheet token usage. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_023.md b/.compozy/tasks/qa-review/reviews-001/issue_023.md deleted file mode 100644 index 02c6938f9..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_023.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -status: resolved -file: web/src/systems/knowledge/components/knowledge-list-panel.tsx -line: 44 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:15eef0ef15ec -review_hash: 15eef0ef15ec -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 023: Use one canonical memoryKey for row identity. -## Review Comment - -`data-testid` uses `memory.key ?? memory.filename`, while selection/callback/key use `knowledgeMemoryKey(memory)`. Keeping a single key source avoids drift and duplicate IDs in fallback scenarios. - -Also applies to: 162-166 - -## Triage - -- Decision: `valid` -- Notes: - - `KnowledgeListItem` uses `memory.key ?? memory.filename` for the row `data-testid`, while the component's key, selection, and click callback all use `knowledgeMemoryKey(memory)`. - - Root cause: row identity is derived from a different fallback path than the rest of the component, so a future key derivation change can drift test ids and selection semantics apart. - - Fix plan: compute the canonical memory key once through `knowledgeMemoryKey(memory)` and reuse it for row identity, selection, and tests, then update `knowledge-list-panel.test.tsx` accordingly. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_024.md b/.compozy/tasks/qa-review/reviews-001/issue_024.md deleted file mode 100644 index efc2aaf5a..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_024.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -status: resolved -file: web/src/systems/knowledge/components/stories/knowledge-list-panel.stories.tsx -line: 201 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:be3802a4c9e0 -review_hash: be3802a4c9e0 -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 024: Use the shared key helper for story test-id construction. -## Review Comment - -This keeps stories resilient if fallback key derivation rules evolve. - -## Triage - -- Decision: `valid` -- Notes: - - The knowledge list panel story constructs a row test id from `defaultMemories[2].key` instead of the shared key helper used by the component. - - Root cause: story code duplicates the key derivation contract instead of importing the canonical helper, so it can drift if fallback rules change. - - Fix plan: import `knowledgeMemoryKey()` into the story and update the storybook source regression test to pin the shared helper usage. diff --git a/.compozy/tasks/qa-review/reviews-001/issue_025.md b/.compozy/tasks/qa-review/reviews-001/issue_025.md deleted file mode 100644 index 47ac7d1b0..000000000 --- a/.compozy/tasks/qa-review/reviews-001/issue_025.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -status: resolved -file: web/src/systems/tasks/lib/task-editor.ts -line: 134 -severity: nitpick -author: coderabbitai[bot] -provider_ref: review:4176489704,nitpick_hash:b26c0ae72079 -review_hash: b26c0ae72079 -source_review_id: "4176489704" -source_review_submitted_at: "2026-04-26T03:49:14Z" ---- - -# Issue 025: Extract shared base payload mapping to avoid builder drift. -## Review Comment - -`buildCreateChildTaskRequest` now mirrors `buildCreateTaskRequest` field-for-field. Consider moving shared mapping into a single helper so future field changes remain consistent across both paths. - -## Triage - -- Decision: `invalid` -- Notes: - - `buildCreateTaskRequest()` and `buildCreateChildTaskRequest()` currently produce equivalent field mappings by design, but the review comment identifies a speculative future-drift risk rather than a present behavioral bug or rule violation. - - The current tests already pin the create-task and create-child-task payload shapes independently, and the requested extraction would be a proactive refactor unrelated to any broken behavior in this batch. - - Because this remediation run is constrained to concrete review defects and regressions, I am not widening scope for a no-op deduplication refactor. diff --git a/.compozy/tasks/qa-rounds/reviews-001/_meta.md b/.compozy/tasks/qa-rounds/reviews-001/_meta.md new file mode 100644 index 000000000..d06b4d908 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/_meta.md @@ -0,0 +1,11 @@ +--- +provider: coderabbit +pr: "78" +round: 1 +created_at: 2026-04-26T20:29:02.311014Z +--- + +## Summary +- Total: 33 +- Resolved: 0 +- Unresolved: 33 diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_001.md b/.compozy/tasks/qa-rounds/reviews-001/issue_001.md new file mode 100644 index 000000000..590ff520c --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_001.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: internal/api/core/agent_identity.go +line: 148 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:ee846dd868b1 +review_hash: ee846dd868b1 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 001: Consider wrapping the error with context. +## Review Comment + +Line 158 returns the error from `ResolveCoordinatorConfig` without wrapping. Adding context would help trace failures. + +As per coding guidelines: "Use explicit error returns with wrapped context: `fmt.Errorf("context: %w", err)` in Go". + +## Triage + +- Decision: `VALID` +- Notes: `agentCoordinatorConfigPayload` returns the raw `ResolveCoordinatorConfig` error. This violates the local wrapped-error convention and loses call-site context. Fix by wrapping the resolver failure with `fmt.Errorf("resolve coordinator config: %w", err)`. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_002.md b/.compozy/tasks/qa-rounds/reviews-001/issue_002.md new file mode 100644 index 000000000..cacb8c3b8 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_002.md @@ -0,0 +1,48 @@ +--- +status: resolved +file: internal/api/udsapi/agent_channels_test.go +line: 116 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vG,comment:PRRC_kwDOR5y4QM67Z0NA +--- + +# Issue 002: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Wrap this test case in `t.Run("Should ...")` to match required test structure.** + +This new test is currently a direct top-level body without the required subtest naming pattern. + + +As per coding guidelines, "MUST use t.Run("Should...") pattern for ALL test cases". + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/api/udsapi/agent_channels_test.go` around lines 72 - 116, Wrap the +existing TestAgentCoordinatorConfigRouteReturnsResolvedPayload body in a subtest +using t.Run("Should return resolved workspace coordinator payload", func(t +*testing.T) { ... }) so the test follows the required t.Run("Should ...") +pattern; locate the TestAgentCoordinatorConfigRouteReturnsResolvedPayload +function and move its current contents into a t.Run call while keeping all setup +(manager := activeAgentSessionManager, handlers := newTestHandlers, +handlers.CoordinatorConfig = agentCoordinatorConfigResolverFunc, engine := +newTestRouter, performAgentKernelRequest, decodeJSONResponse and assertions) +unchanged inside the subtest body. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestAgentCoordinatorConfigRouteReturnsResolvedPayload` contains one direct top-level test body while this repo requires each case to run under a `t.Run("Should ...")` subtest. Fix by moving the existing setup, request, and assertions into a named `Should return resolved workspace coordinator payload` subtest. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_003.md b/.compozy/tasks/qa-rounds/reviews-001/issue_003.md new file mode 100644 index 000000000..207e61b32 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_003.md @@ -0,0 +1,93 @@ +--- +status: resolved +file: internal/automation/dispatch.go +line: 60 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vH,comment:PRRC_kwDOR5y4QM67Z0NB +--- + +# Issue 003: _🛠️ Refactor suggestion_ | _🟠 Major_ +## Review Comment + +_🛠️ Refactor suggestion_ | _🟠 Major_ + +**Make session stop timeout configurable, not hardcoded.** + +The new 10s value is operational policy in a core runtime path; it should be injected via dispatcher options (or TOML-backed config), not fixed in code. + + +
    +Proposed refactor + +```diff +-const dispatcherSessionStopTimeout = 10 * time.Second ++const defaultDispatcherSessionStopTimeout = 10 * time.Second + + type Dispatcher struct { + sessions SessionCreator + runs RunStore + tasks TaskService ++ sessionStopTimeout time.Duration + ... + } + + func NewDispatcher(sessions SessionCreator, runs RunStore, opts ...DispatcherOption) (*Dispatcher, error) { + ... + dispatcher := &Dispatcher{ + sessions: sessions, + runs: runs, + logger: slog.Default(), + now: func() time.Time { return time.Now().UTC() }, + sleep: sleepWithContext, + maxConcurrent: DefaultMaxConcurrentJobs, ++ sessionStopTimeout: defaultDispatcherSessionStopTimeout, + } + ... + } + ++func WithDispatcherSessionStopTimeout(timeout time.Duration) DispatcherOption { ++ return func(dispatcher *Dispatcher) { ++ if timeout > 0 { ++ dispatcher.sessionStopTimeout = timeout ++ } ++ } ++} + + func (d *Dispatcher) stopAutomationSession(ctx context.Context, sessionID string, status RunStatus, runErr error) error { + ... +- stopCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), dispatcherSessionStopTimeout) ++ stopCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), d.sessionStopTimeout) + defer cancel() + ... + } +``` +
    +As per coding guidelines, "Never hardcode configuration in Go — use TOML config or functional options". + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/automation/dispatch.go` around lines 56 - 60, The hardcoded constant +dispatcherSessionStopTimeout should be made configurable via dispatcher options +rather than fixed in code: remove or replace the package-level const +dispatcherSessionStopTimeout and add a field (e.g., SessionStopTimeout +time.Duration) to the dispatcher options/config struct used by NewDispatcher (or +Dispatcher) and its option helpers; wire that value into the shutdown logic that +currently references dispatcherSessionStopTimeout and provide a sensible default +(10*time.Second) when the option is not set so existing behavior remains +unchanged. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `dispatcherSessionStopTimeout` is a package-level operational timeout fixed at 10 seconds. The dispatcher already uses functional options for policy injection, so this should be configurable with a default. Fix by adding a `sessionStopTimeout` field, a `defaultDispatcherSessionStopTimeout`, and `WithDispatcherSessionStopTimeout`, then using the field in `stopAutomationSession`. A focused constructor test in `internal/automation/dispatch_test.go` is needed even though that file is outside the batch list because it validates the new option. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_004.md b/.compozy/tasks/qa-rounds/reviews-001/issue_004.md new file mode 100644 index 000000000..bfd6c0046 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_004.md @@ -0,0 +1,187 @@ +--- +status: resolved +file: internal/daemon/daemon_integration_test.go +line: 2483 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vI,comment:PRRC_kwDOR5y4QM67Z0NC +--- + +# Issue 004: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Wrap this scenario in a `t.Run("Should...")` test case to match test policy.** + +The scenario is good, but the new test case should follow the required `Should...` subtest pattern for consistency with repository standards. + +
    +Minimal structure change + +```diff + func TestBootRunsWorkspaceTaskRunHookWithRelativeScriptPath(t *testing.T) { +- homePaths := integrationHomePaths(t) +- ... ++ t.Run("ShouldRunWorkspaceTaskRunHookWithRelativeScriptPath", func(t *testing.T) { ++ homePaths := integrationHomePaths(t) ++ ... ++ }) + } +``` +
    + + +As per coding guidelines, `**/*_test.go`: "MUST use t.Run("Should...") pattern for ALL test cases". + + + +
    +📝 Committable suggestion + +> ‼️ **IMPORTANT** +> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. + +```suggestion +func TestBootRunsWorkspaceTaskRunHookWithRelativeScriptPath(t *testing.T) { + t.Run("ShouldRunWorkspaceTaskRunHookWithRelativeScriptPath", func(t *testing.T) { + homePaths := integrationHomePaths(t) + cfg := testConfig(t, homePaths) + cfg.Memory.Enabled = false + cfg.Skills.Enabled = false + + workspaceRoot := filepath.Join(t.TempDir(), "workspace") + if err := os.MkdirAll(filepath.Join(workspaceRoot, aghconfig.DirName, "hooks"), 0o755); err != nil { + t.Fatalf( + "os.MkdirAll(%q) error = %v", + filepath.Join(workspaceRoot, aghconfig.DirName, "hooks"), + err, + ) + } + writeDaemonFile( + t, + filepath.Join(workspaceRoot, aghconfig.DirName, "hooks", "capture-task-run.sh"), + "#!/bin/sh\ncat > \"$1\"\n", + ) + if err := os.Chmod( + filepath.Join(workspaceRoot, aghconfig.DirName, "hooks", "capture-task-run.sh"), + 0o755, + ); err != nil { + t.Fatalf("os.Chmod(capture-task-run.sh) error = %v", err) + } + writeDaemonFile(t, filepath.Join(workspaceRoot, aghconfig.DirName, "config.toml"), ` +[[hooks.declarations]] +name = "workspace-task-run" +event = "task.run.enqueued" +mode = "sync" +command = "/bin/sh" +args = [".agh/hooks/capture-task-run.sh", ".agh/task-run-enqueued.json"] +`) + + resolvedWorkspace := seedDaemonWorkspace(t, homePaths, workspaceRoot) + + d, err := New( + WithHomePaths(homePaths), + WithConfig(&cfg), + WithLogger(discardLogger()), + ) + if err != nil { + t.Fatalf("New() error = %v", err) + } + d.newSessionManager = func(_ context.Context, deps SessionManagerDeps) (SessionManager, error) { + return &fakeSessionManager{}, nil + } + d.newObserver = func(context.Context, RuntimeDeps) (Observer, error) { + return &fakeObserver{}, nil + } + d.httpFactory = func(context.Context, RuntimeDeps) (Server, error) { + return &fakeServer{name: "http"}, nil + } + d.udsFactory = func(context.Context, RuntimeDeps) (Server, error) { + return &fakeServer{name: "uds"}, nil + } + + if err := d.boot(testutil.Context(t)); err != nil { + t.Fatalf("boot() error = %v", err) + } + t.Cleanup(func() { + if err := d.Shutdown(testutil.Context(t)); err != nil { + t.Fatalf("Shutdown() error = %v", err) + } + }) + if d.hooks == nil { + t.Fatal("boot() did not initialize daemon hooks") + } + + payload := hookspkg.TaskRunEnqueuedPayload{ + PayloadBase: hookspkg.PayloadBase{ + Event: hookspkg.HookTaskRunEnqueued, + Timestamp: time.Date(2026, 4, 26, 19, 30, 0, 0, time.UTC), + }, + TaskRunContext: hookspkg.TaskRunContext{ + TaskID: "task-1", + RunID: "run-1", + WorkspaceID: resolvedWorkspace.ID, + CoordinationChannelID: "operations", + NetworkChannel: "operations", + AgentName: "qa", + TaskStatus: "ready", + RunStatus: "queued", + }, + IdempotencyKey: "task.start.task-1", + } + + if _, err := d.hooks.DispatchTaskRunEnqueued(testutil.Context(t), payload); err != nil { + t.Fatalf("DispatchTaskRunEnqueued() error = %v", err) + } + + outputPath := filepath.Join(workspaceRoot, aghconfig.DirName, "task-run-enqueued.json") + body, err := os.ReadFile(outputPath) + if err != nil { + t.Fatalf("os.ReadFile(%q) error = %v", outputPath, err) + } + + var captured hookspkg.TaskRunEnqueuedPayload + if err := json.Unmarshal(body, &captured); err != nil { + t.Fatalf("json.Unmarshal(task run hook payload) error = %v; body=%s", err, string(body)) + } + if captured.Event != hookspkg.HookTaskRunEnqueued || + captured.WorkspaceID != resolvedWorkspace.ID || + captured.RunID != "run-1" { + t.Fatalf("captured payload = %#v, want enqueued payload for the seeded workspace run", captured) + } + }) +} +``` + +
    + + + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/daemon/daemon_integration_test.go` around lines 2377 - 2483, The +test function TestBootRunsWorkspaceTaskRunHookWithRelativeScriptPath must be +wrapped in a t.Run subtest using the repository's "Should..." naming convention; +update the body of TestBootRunsWorkspaceTaskRunHookWithRelativeScriptPath so its +existing implementation is executed inside t.Run("Should run workspace task-run +hook with relative script path", func(t *testing.T) { ... }), keeping all +existing setup, payload creation, hook dispatch, and assertions intact +(references: TestBootRunsWorkspaceTaskRunHookWithRelativeScriptPath, d.boot, +d.hooks.DispatchTaskRunEnqueued, seedDaemonWorkspace, +hookspkg.TaskRunEnqueuedPayload). +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestBootRunsWorkspaceTaskRunHookWithRelativeScriptPath` is a new direct test scenario without the required `Should ...` subtest wrapper. Fix by wrapping the existing body in `t.Run("Should run workspace task-run hook with relative script path", ...)` without changing the boot, hook dispatch, or assertions. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_005.md b/.compozy/tasks/qa-rounds/reviews-001/issue_005.md new file mode 100644 index 000000000..336fce409 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_005.md @@ -0,0 +1,66 @@ +--- +status: resolved +file: internal/daemon/harness_context_test.go +line: 129 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vJ,comment:PRRC_kwDOR5y4QM67Z0ND +--- + +# Issue 005: _🛠️ Refactor suggestion_ | _🟠 Major_ +## Review Comment + +_🛠️ Refactor suggestion_ | _🟠 Major_ + +**Use `Should...` names for the newly added matrix subtests.** + +The new case names should follow the required subtest naming pattern used by `t.Run(tc.name, ...)`. + +
    +Proposed diff + +```diff +- name: "coordinator startup session resolves coordinator policy", ++ name: "Should resolve coordinator policy for coordinator startup session", +... +- name: "spawned worker network turn resolves spawned policy", ++ name: "Should resolve spawned policy for spawned worker network turn", +``` +
    + + + +As per coding guidelines, `**/*_test.go`: "MUST use t.Run("Should...") pattern for ALL test cases". + + +Also applies to: 128-162 + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/daemon/harness_context_test.go` around lines 98 - 129, The test-case +name strings in the table-driven tests (the name field used by t.Run(tc.name, +...)) do not follow the required "Should..." subtest naming pattern; update the +name values for the new matrix entries (e.g., the case currently titled +"coordinator startup session resolves coordinator policy" and the subsequent +"spawned worker network turn resolves spawned policy") to begin with "Should" +(for example "Should resolve coordinator policy on startup session" and "Should +resolve spawned policy for spawned worker network turn"), leaving the rest of +the HarnessResolutionInput, HarnessSessionInput, HarnessTurnRequest, +wantSections/wantTags, and other fields (symbols: name, HarnessResolutionInput, +HarnessSessionInput, HarnessTurnRequest, t.Run) unchanged so t.Run(tc.name, ...) +uses the new "Should..." names. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: The table-driven cases in `TestHarnessContextResolverResolve` are executed through `t.Run(tc.name, ...)`, but the two new case names do not start with `Should`. Fix by renaming them to descriptive `Should ...` names while leaving inputs and expected outputs unchanged. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_006.md b/.compozy/tasks/qa-rounds/reviews-001/issue_006.md new file mode 100644 index 000000000..e3f2fdc88 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_006.md @@ -0,0 +1,138 @@ +--- +status: resolved +file: internal/daemon/harness_context_test.go +line: 458 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vK,comment:PRRC_kwDOR5y4QM67Z0NE +--- + +# Issue 006: _🛠️ Refactor suggestion_ | _🟠 Major_ +## Review Comment + +_🛠️ Refactor suggestion_ | _🟠 Major_ + +**Wrap this new test scenario in `t.Run("Should...")`.** + +The added test should use the required explicit subtest pattern. + +
    +Proposed diff + +```diff + func TestSectionSelectorAcceptsCoordinatorStartupSession(t *testing.T) { + t.Parallel() +- +- resolver := NewHarnessContextResolver(HarnessRuntimeSignals{ +- MemoryPromptSectionEnabled: true, +- SkillsPromptSectionEnabled: true, +- }) +- selector := NewSectionSelector(resolver, nil) +- descriptors := defaultStartupPromptSectionDescriptors( +- promptSectionProviderFunc( +- func(context.Context, *workspacepkg.ResolvedWorkspace) (string, error) { return "memory", nil }, +- ), +- promptSectionProviderFunc( +- func(context.Context, *workspacepkg.ResolvedWorkspace) (string, error) { return "skills", nil }, +- ), +- nil, +- ) +- +- selected, resolved, err := selector.Select(session.StartupPromptContext{ +- SessionType: session.SessionTypeCoordinator, +- Channel: "coord-run-1", +- }, descriptors) +- if err != nil { +- t.Fatalf("Select(coordinator) error = %v", err) +- } +- +- if resolved.Session.SessionClass != SessionClassCoordinator { +- t.Fatalf("SessionClass = %q, want %q", resolved.Session.SessionClass, SessionClassCoordinator) +- } +- wantNames := []string{ +- string(HarnessPromptSectionMemory), +- string(HarnessPromptSectionSkills), +- string(HarnessPromptSectionNetwork), +- } +- gotNames := make([]string, 0, len(selected)) +- for _, descriptor := range selected { +- gotNames = append(gotNames, descriptor.Name) +- } +- if !slices.Equal(gotNames, wantNames) { +- t.Fatalf("selected section names = %#v, want %#v", gotNames, wantNames) +- } ++ t.Run("Should accept coordinator startup session and include coordinator sections", func(t *testing.T) { ++ t.Parallel() ++ ++ resolver := NewHarnessContextResolver(HarnessRuntimeSignals{ ++ MemoryPromptSectionEnabled: true, ++ SkillsPromptSectionEnabled: true, ++ }) ++ selector := NewSectionSelector(resolver, nil) ++ descriptors := defaultStartupPromptSectionDescriptors( ++ promptSectionProviderFunc( ++ func(context.Context, *workspacepkg.ResolvedWorkspace) (string, error) { return "memory", nil }, ++ ), ++ promptSectionProviderFunc( ++ func(context.Context, *workspacepkg.ResolvedWorkspace) (string, error) { return "skills", nil }, ++ ), ++ nil, ++ ) ++ ++ selected, resolved, err := selector.Select(session.StartupPromptContext{ ++ SessionType: session.SessionTypeCoordinator, ++ Channel: "coord-run-1", ++ }, descriptors) ++ if err != nil { ++ t.Fatalf("Select(coordinator) error = %v", err) ++ } ++ ++ if resolved.Session.SessionClass != SessionClassCoordinator { ++ t.Fatalf("SessionClass = %q, want %q", resolved.Session.SessionClass, SessionClassCoordinator) ++ } ++ wantNames := []string{ ++ string(HarnessPromptSectionMemory), ++ string(HarnessPromptSectionSkills), ++ string(HarnessPromptSectionNetwork), ++ } ++ gotNames := make([]string, 0, len(selected)) ++ for _, descriptor := range selected { ++ gotNames = append(gotNames, descriptor.Name) ++ } ++ if !slices.Equal(gotNames, wantNames) { ++ t.Fatalf("selected section names = %#v, want %#v", gotNames, wantNames) ++ } ++ }) + } +``` +
    + + + +As per coding guidelines, `**/*_test.go`: "MUST use t.Run("Should...") pattern for ALL test cases". + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/daemon/harness_context_test.go` around lines 417 - 458, Wrap the new +test function TestSectionSelectorAcceptsCoordinatorStartupSession in an explicit +subtest using t.Run with a "Should..." description; e.g., inside +TestSectionSelectorAcceptsCoordinatorStartupSession call t.Run("Should select +coordinator startup sections", func(t *testing.T) { ... }) and move all existing +test logic (resolver/selector/descriptors creation, selector.Select call, +assertions on resolved.Session.SessionClass and selected names) into that +subtest so the test follows the required t.Run("Should...") subtest pattern. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestSectionSelectorAcceptsCoordinatorStartupSession` has a direct body and no `Should ...` subtest. Fix by moving the existing selector setup and assertions into a `Should select coordinator startup sections` subtest. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_007.md b/.compozy/tasks/qa-rounds/reviews-001/issue_007.md new file mode 100644 index 000000000..5cb5d9576 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_007.md @@ -0,0 +1,76 @@ +--- +status: resolved +file: internal/daemon/notifier_test.go +line: 471 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vL,comment:PRRC_kwDOR5y4QM67Z0NF +--- + +# Issue 007: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Convert this test into `t.Run("Should...")` cases per policy.** + +Line 412 currently uses a single monolithic test case; please split assertions into named `Should...` subtests (ideally table-driven) to align with repo test standards. + +
    +Refactor outline + +```diff + func TestScopeWorkspaceHookDeclsOnlyInjectsSupportedMatcherFields(t *testing.T) { + t.Parallel() +- // single scenario with many assertions ++ testCases := []struct { ++ name string ++ event hookspkg.HookEvent ++ // expected matcher behavior fields... ++ }{ ++ {name: "ShouldInjectWorkspaceIDAndRootForSessionHooks", event: hookspkg.HookSessionPostCreate}, ++ {name: "ShouldInjectOnlyWorkspaceIDForTaskRunHooks", event: hookspkg.HookTaskRunEnqueued}, ++ {name: "ShouldNotInjectWorkspaceFieldsForMessageHooks", event: hookspkg.HookMessageDelta}, ++ } ++ for _, tc := range testCases { ++ tc := tc ++ t.Run(tc.name, func(t *testing.T) { ++ t.Parallel() ++ // scenario setup + focused assertions ++ }) ++ } + } +``` +
    + + +As per coding guidelines, `**/*_test.go`: "Table-driven tests with subtests (t.Run) as default pattern" and "MUST use t.Run("Should...") pattern for ALL test cases". + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/daemon/notifier_test.go` around lines 412 - 471, Split the +monolithic TestScopeWorkspaceHookDeclsOnlyInjectsSupportedMatcherFields into +table-driven subtests using t.Run("Should ...") entries: build a table of cases +(e.g., "Should inject workspace fields for session", "Should only inject +WorkspaceID for task-run", "Should not inject workspace fields for message", +"Should not mutate original decls") that each call scopeWorkspaceHookDecls with +the same inputs and assert the specific Matcher fields and +hookspkg.ValidateMatcherForEvent results; keep the original decls and resolved +values for reuse, reference the function under test scopeWorkspaceHookDecls and +use hookspkg.ValidateMatcherForEvent in each subtest, and ensure the final case +verifies the original decls were not mutated. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestScopeWorkspaceHookDeclsOnlyInjectsSupportedMatcherFields` combines multiple independent assertions in one monolithic body. Fix by splitting the session, task-run, message, and immutability checks into table-driven `Should ...` subtests while reusing the same source declarations. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_008.md b/.compozy/tasks/qa-rounds/reviews-001/issue_008.md new file mode 100644 index 000000000..39e9fea0a --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_008.md @@ -0,0 +1,89 @@ +--- +status: resolved +file: internal/hooks/matcher_test.go +line: 458 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vM,comment:PRRC_kwDOR5y4QM67Z0NH +--- + +# Issue 008: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Rename subtests to the required `Should...` pattern.** + +The table-driven structure is good; please update case names (Lines 443-447) so each `t.Run` uses `Should...` wording. + +
    +Example rename + +```diff +- {name: "session workspace root", event: HookSessionPostCreate, field: "workspace_root", want: true}, ++ {name: "ShouldAllowWorkspaceRootForSessionHooks", event: HookSessionPostCreate, field: "workspace_root", want: true}, +``` +
    + + +As per coding guidelines, `**/*_test.go`: "MUST use t.Run("Should...") pattern for ALL test cases". + + + +
    +📝 Committable suggestion + +> ‼️ **IMPORTANT** +> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. + +```suggestion + {name: "ShouldAllowWorkspaceRootForSessionPostCreateHook", event: HookSessionPostCreate, field: "workspace_root", want: true}, + {name: "ShouldAllowWorkspaceIdForTaskRunEnqueuedHook", event: HookTaskRunEnqueued, field: "workspace_id", want: true}, + {name: "ShouldDenyWorkspaceRootForTaskRunEnqueuedHook", event: HookTaskRunEnqueued, field: "workspace_root", want: false}, + {name: "ShouldDenyWorkspaceIdForMessageDeltaHook", event: HookMessageDelta, field: "workspace_id", want: false}, + {name: "ShouldDenyInvalidEvent", event: HookEvent("bad.event"), field: "workspace_id", want: false}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + t.Parallel() + + if got := MatcherFieldAllowedForEvent(tt.event, tt.field); got != tt.want { + t.Fatalf("MatcherFieldAllowedForEvent(%q, %q) = %v, want %v", tt.event, tt.field, got, tt.want) + } + }) + } +``` + +
    + + + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/hooks/matcher_test.go` around lines 443 - 458, Update the t.Run case +names in the table-driven test for MatcherFieldAllowedForEvent so they follow +the "Should..." pattern; specifically rename the entries currently named +"session workspace root", "task run workspace id", "task run workspace root", +"message workspace id", and "invalid event" to descriptive "Should..." strings +(e.g., "Should allow session workspace root", "Should allow task run workspace +id", "Should not allow task run workspace root", "Should not allow message +workspace id", "Should not allow invalid event") so t.Run uses the required +format while keeping the same event constants (HookSessionPostCreate, +HookTaskRunEnqueued, HookMessageDelta, HookEvent("bad.event")) and field values +("workspace_root", "workspace_id") used by MatcherFieldAllowedForEvent. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestMatcherFieldAllowedForEvent` is already table-driven, but its `name` values do not follow the mandatory `Should ...` convention. Fix by renaming the five cases only. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_009.md b/.compozy/tasks/qa-rounds/reviews-001/issue_009.md new file mode 100644 index 000000000..3867da56a --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_009.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: internal/network/manager_test.go +line: 751 +severity: minor +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:004e4aeb5977 +review_hash: 004e4aeb5977 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 009: These metric expectations are heartbeat-timing sensitive. +## Review Comment + +The exact `sent/received` and `KindGreet` counts here assume neither session heartbeat fires before `Status()` runs. Since `JoinChannel()` starts live 1-second heartbeats, slower CI can legitimately observe extra greet traffic and fail this test intermittently. I’d make the test config use a much longer greet interval, or assert only the deltas introduced by the explicit `KindSay`. + +Also applies to: 781-793 + +## Triage + +- Decision: `VALID` +- Notes: `TestManagerStatusTracksWorkflowMetricsAndStructuredLogs` uses `testManagerConfig()`, whose one-second greet interval can emit extra heartbeat greets before `Status()` on slow CI. Fix by giving this test a longer `GreetInterval` while preserving the expected initial-greet and explicit `KindSay` metrics. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_010.md b/.compozy/tasks/qa-rounds/reviews-001/issue_010.md new file mode 100644 index 000000000..5e69b028f --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_010.md @@ -0,0 +1,21 @@ +--- +status: resolved +file: internal/network/router.go +line: 639 +severity: minor +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:335034e78607 +review_hash: 335034e78607 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 010: Directed self-WHOIS is now suppressed, but the manager still audits it as received. +## Review Comment + +Line 640 drops the local responder when the directed target is also the sender. That means a self-directed WHOIS now returns with no `Generated` response, but `Manager.controlMessageReceivers()` still records the target session as having received the request. This will skew received audits/stats for a message that was intentionally skipped. + +## Triage + +- Decision: `VALID` +- Notes: A directed WHOIS whose target is also the sender is intentionally skipped by `whoisRequestResponders`, but the route result remains non-ignored. `Manager.recordInboundAudit` suppresses control-message receiver audit only for ignored/rejected results, so the skipped self-WHOIS can still be audited as received. Fix within `internal/network/router.go` by marking directed self-WHOIS requests as `Ignored`, and add coverage in `router_test.go`. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_011.md b/.compozy/tasks/qa-rounds/reviews-001/issue_011.md new file mode 100644 index 000000000..6797c2832 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_011.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: internal/network/router_test.go +line: 160 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:68efdaccc85c +review_hash: 68efdaccc85c +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 011: Split the new echo coverage into t.Run subtests. +## Review Comment + +These are two independent behaviors—broadcast self-echo and directed self-echo—so separate subtests will isolate failures better and match the repo’s default test style. + +As per coding guidelines, "Use table-driven tests with subtests (`t.Run`) as default pattern" and "MUST use t.Run("Should...") pattern for ALL test cases". + +## Triage + +- Decision: `VALID` +- Notes: `TestRouterDoesNotDeliverLocalEchoesToSender` checks broadcast and directed self-echo behavior in one body. Fix by splitting these into separate `Should ...` subtests that share the same router setup pattern. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_012.md b/.compozy/tasks/qa-rounds/reviews-001/issue_012.md new file mode 100644 index 000000000..ed1fdfa2b --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_012.md @@ -0,0 +1,172 @@ +--- +status: resolved +file: internal/observe/observer.go +line: 605 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vN,comment:PRRC_kwDOR5y4QM67Z0NI +--- + +# Issue 012: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +
    +🧩 Analysis chain + +🏁 Script executed: + +```shell +#!/bin/bash +# Inspect the detached-context fallbacks and check whether any observer entrypoints +# are already invoked with an explicit nil context. +sed -n '598,669p' internal/observe/observer.go +rg -nP --type=go -C2 '\.(OnSessionCreated|OnAgentEvent|OnAgentEventForSession)\s*\(\s*nil\b|New\s*\(\s*nil\b' +``` + +Repository: compozy/agh + +Length of output: 2885 + +--- + +🏁 Script executed: + +```shell +#!/bin/bash +# Find all calls to recoverSessionSnapshot and observedSessionSnapshot +rg -n 'recoverSessionSnapshot|observedSessionSnapshot' internal/observe/observer.go +``` + +Repository: compozy/agh + +Length of output: 535 + +--- + +🏁 Script executed: + +```shell +#!/bin/bash +# Find all calls to Observer methods that might trigger context issues +rg -n -B3 -A3 'o\.recoverSessionSnapshot\(|o\.observedSessionSnapshot\(' internal/observe/observer.go +``` + +Repository: compozy/agh + +Length of output: 1085 + +--- + +🏁 Script executed: + +```shell +#!/bin/bash +# Check where Observer is instantiated and used, trace context flow from entry points +rg -n -B2 -A5 'func.*On(SessionCreated|AgentEvent|SessionStopped)' internal/observe/observer.go | head -100 +``` + +Repository: compozy/agh + +Length of output: 1352 + +--- + +🏁 Script executed: + +```shell +#!/bin/bash +# Find observeAgentEvent definition and trace context flow +rg -n -B3 -A15 'func \(o \*Observer\) observeAgentEvent' internal/observe/observer.go +``` + +Repository: compozy/agh + +Length of output: 781 + +--- + +🏁 Script executed: + +```shell +#!/bin/bash +# Find all calls to observer methods from outside to see entry points +rg -n 'Observer.*OnSessionCreated|Observer.*OnAgentEvent|Observer.*OnSessionStopped' --type=go | grep -v 'internal/observe/observer.go' | head -20 +``` + +Repository: compozy/agh + +Length of output: 2405 + +--- + +🏁 Script executed: + +```shell +#!/bin/bash +# Find validateObservedEvent implementation +rg -n -B3 -A20 'func \(o \*Observer\) validateObservedEvent' internal/observe/observer.go +``` + +Repository: compozy/agh + +Length of output: 733 + +--- + +🏁 Script executed: + +```shell +#!/bin/bash +# Check if there are any other places where these functions might be called +rg -n 'recoverSessionSnapshot|observedSessionSnapshot' internal/observe/ --type=go +``` + +Repository: compozy/agh + +Length of output: 709 + +--- + + + +
    + +**Remove the `context.Background()` fallback — require callers to pass valid context.** + +These helper functions (`recoverSessionSnapshot` at line 603-605, `observedSessionSnapshot` at line 647-649) cross I/O boundaries (`ListSessions`, permission/config resolution). The nil-check fallbacks that replace missing context with `context.Background()` violate the coding guideline for `internal/**/*.go`: "Avoid `context.Background()` outside `main` and focused tests." + +This design masks upstream bugs instead of surfacing them. If a nil context ever reaches these functions, the caller has failed to propagate context correctly. That failure should be loud, not silently replaced with a detached context that drops cancellation, deadlines, and request-scoped values. + +Since all public entrypoints (`OnSessionCreated`, `OnAgentEvent`, `OnAgentEventForSession`, `OnSessionStopped`) already receive `context.Context` as the first parameter and propagate it consistently through the call chain, these fallbacks are unnecessary. Remove them and require callers to always pass valid context. + +Also applies to line 647-649 in `observedSessionSnapshot`. + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/observe/observer.go` around lines 603 - 605, Remove the silent +context.Background() fallback in recoverSessionSnapshot and +observedSessionSnapshot; do not substitute a background context when ctx is nil +— instead make the failure loud so callers must pass a valid context. +Concretely, delete the block `if ctx == nil { ctx = context.Background() }` in +both recoverSessionSnapshot and observedSessionSnapshot and replace it with a +clear guard that surfaces the bug (for example `if ctx == nil { panic("nil +context passed to recoverSessionSnapshot") }` and similarly for +observedSessionSnapshot) so upstream callers are forced to propagate a non-nil +context. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `recoverSessionSnapshot` and `observedSessionSnapshot` silently replace nil contexts with `context.Background()`, dropping cancellation/deadlines and hiding caller bugs in production I/O paths. Fix by removing the fallback and making nil context a loud programmer error in both helpers. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_013.md b/.compozy/tasks/qa-rounds/reviews-001/issue_013.md new file mode 100644 index 000000000..2fecfc529 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_013.md @@ -0,0 +1,46 @@ +--- +status: resolved +file: internal/observe/observer.go +line: 632 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vP,comment:PRRC_kwDOR5y4QM67Z0NK +--- + +# Issue 013: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Registry recovery should not re-cache stopped sessions.** + +`OnSessionStopped` explicitly evicts the session from `o.sessions`, but this branch adds it back for any persisted registry row. A late event for a completed session will therefore repopulate the in-memory cache and keep it there indefinitely, because stopped sessions remain in the registry. Consider making registry recovery a one-shot lookup, or only caching sessions that are still live. + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/observe/observer.go` around lines 621 - 632, The registry recovery +currently re-adds stopped sessions into the in-memory cache by calling +o.trackSession in the ListSessions loop; change this to avoid caching sessions +that are no longer live. Specifically, in the loop that iterates sessions +returned by o.registry.ListSessions, only call o.trackSession for sessions whose +persisted metadata indicates they are active (e.g., check a liveliness field +like info.State/Status not equal to "stopped" or !info.Stopped); otherwise, +return the observed snapshot without calling o.trackSession (or treat recovery +as a one-shot lookup). Also ensure this behavior is consistent with +OnSessionStopped which evicts from o.sessions so stopped sessions are never +re-added. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `OnSessionStopped` evicts a session from the observer cache, but registry recovery currently calls `trackSession` for every persisted row, including `stopped` sessions. A late event can therefore re-cache stopped sessions indefinitely. Fix by treating stopped registry rows as one-shot recovery snapshots and only tracking non-stopped rows. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_014.md b/.compozy/tasks/qa-rounds/reviews-001/issue_014.md new file mode 100644 index 000000000..d0d36e479 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_014.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: internal/observe/observer_test.go +line: 93 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:f22e7360bba5 +review_hash: f22e7360bba5 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 014: Prefer one table-driven test with t.Run("Should...") subtests here. +## Review Comment + +These two cases only vary by recovery source, so consolidating them would remove duplicated setup/assert logic and match the test structure required in this repo. + +As per coding guidelines, `**/*_test.go`: "Table-driven tests with subtests (t.Run) as default pattern" and "MUST use t.Run("Should...") pattern for ALL test cases". + +## Triage + +- Decision: `VALID` +- Notes: The live-source and registry recovery tests duplicate setup and assertions while varying only the recovery source. Fix by consolidating them into one table-driven test with `Should ...` subtests. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_015.md b/.compozy/tasks/qa-rounds/reviews-001/issue_015.md new file mode 100644 index 000000000..ea36d6b63 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_015.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: internal/observe/observer_test.go +line: 98 +severity: minor +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:bddf8f7e5a68 +review_hash: bddf8f7e5a68 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 015: Isolate the live-source path in this test. +## Review Comment + +Registering the session in `h.registry` makes this case pass even if the live-source lookup breaks, because `recoverSessionSnapshot` can still succeed through the registry fallback. Leave the registry unseeded here, or force the fallback path to fail, so this test actually proves the live-source branch. + +As per coding guidelines, `**/*_test.go`: "Ensure tests can fail when business logic changes." + +## Triage + +- Decision: `VALID` +- Notes: The live-source recovery test seeds both the live source and the registry, so it can pass through the registry fallback even if live-source recovery is broken. Fix by leaving the registry unseeded for the live-source subtest in the consolidated recovery test. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_016.md b/.compozy/tasks/qa-rounds/reviews-001/issue_016.md new file mode 100644 index 000000000..fd7f7e372 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_016.md @@ -0,0 +1,98 @@ +--- +status: resolved +file: internal/scheduler/scheduler.go +line: 534 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vQ,comment:PRRC_kwDOR5y4QM67Z0NL +--- + +# Issue 016: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Channel-bound runs can still match unscoped sessions.** + +`coordinationChannelMatches` returns `true` when `candidate.Channel` is empty, so a run with `CoordinationChannelID` can still be routed to sessions outside that channel. + + + +
    +Proposed fix + +```diff + func coordinationChannelMatches(work *RunSnapshot, candidate SessionSnapshot) bool { + if work == nil { + return false + } ++ runChannel := strings.TrimSpace(work.Run.CoordinationChannelID) + sessionChannel := strings.TrimSpace(candidate.Channel) +- if sessionChannel == "" { ++ if runChannel == "" { + return true + } +- return strings.TrimSpace(work.Run.CoordinationChannelID) == sessionChannel ++ return sessionChannel == runChannel + } +``` +
    + + + +
    +📝 Committable suggestion + +> ‼️ **IMPORTANT** +> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. + +```suggestion + if !coordinationChannelMatches(work, candidate) { + return false + } + return capabilitiesCover(candidate.Capabilities, work.Run.RequiredCapabilities) +} + +func coordinationChannelMatches(work *RunSnapshot, candidate SessionSnapshot) bool { + if work == nil { + return false + } + runChannel := strings.TrimSpace(work.Run.CoordinationChannelID) + sessionChannel := strings.TrimSpace(candidate.Channel) + if runChannel == "" { + return true + } + return sessionChannel == runChannel +} +``` + +
    + + + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/scheduler/scheduler.go` around lines 519 - 534, +coordinationChannelMatches currently treats an empty candidate.Channel as a +wildcard and returns true, which allows a run with a non-empty +Run.CoordinationChannelID to match an unscoped session; update +coordinationChannelMatches (and its use of SessionSnapshot.Channel and +RunSnapshot.Run.CoordinationChannelID) so that an empty session channel only +matches when the run's CoordinationChannelID is also empty (i.e., trim both +values and return true only if both are empty or both are equal), keeping the +existing nil guard for work. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `coordinationChannelMatches` treats an empty candidate session channel as a wildcard even when the run is bound to a coordination channel. That can wake an unscoped session for channel-bound work. Fix by trimming both values and requiring equality when the run channel is non-empty; empty session channels only match unbound runs. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_017.md b/.compozy/tasks/qa-rounds/reviews-001/issue_017.md new file mode 100644 index 000000000..bc396d377 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_017.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: internal/scheduler/scheduler_channel_test.go +line: 15 +severity: minor +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:4b4b45f7e73c +review_hash: 4b4b45f7e73c +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 017: Add an empty-channel regression case for channel-bound runs. +## Review Comment + +Current tests validate matching and mismatching channels, but they don’t guard the critical edge case where a session has `Channel == ""` while the run is channel-bound. + +As per coding guidelines, "Focus on critical paths: workflow execution, state management, error handling" and tests should ensure behavior regressions are caught. + +## Triage + +- Decision: `VALID` +- Notes: Existing scheduler channel tests cover matching and wrong-channel sessions but not the critical edge case where the session channel is empty and the run is channel-bound. Add a `Should ...` regression subtest for that no-match case. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_018.md b/.compozy/tasks/qa-rounds/reviews-001/issue_018.md new file mode 100644 index 000000000..f331e7945 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_018.md @@ -0,0 +1,21 @@ +--- +status: resolved +file: internal/session/query.go +line: 279 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:43a18e2d5bd1 +review_hash: 43a18e2d5bd1 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 018: Consider normalizing meta.Model during hydration for consistency. +## Review Comment + +Optional: trim at read time too, so older/manual metadata with stray whitespace doesn’t leak into API/session snapshots. + +## Triage + +- Decision: `VALID` +- Notes: `sessionInfoFromMeta` copies `meta.Model` directly while session creation paths trim resolved model names. Manual or older metadata with whitespace can leak inconsistent snapshots. Fix by trimming `meta.Model` during hydration. A focused assertion in `internal/session/query_test.go` is needed even though that test file is outside the batch list because it validates the hydration behavior. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_019.md b/.compozy/tasks/qa-rounds/reviews-001/issue_019.md new file mode 100644 index 000000000..e8df71bb6 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_019.md @@ -0,0 +1,73 @@ +--- +status: resolved +file: internal/store/globaldb/global_db_task_test.go +line: 1112 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vR,comment:PRRC_kwDOR5y4QM67Z0NM +--- + +# Issue 019: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Wrap these new test cases in `t.Run("Should...")` subtests.** + +Both new cases are written as direct top-level test bodies. This repo’s test policy requires `t.Run("Should...")` for all test cases. + + + +
    +Suggested structure + +```diff + func TestGlobalDBUpdateTaskRunAllowsQueuedSessionRelease(t *testing.T) { + t.Parallel() ++ t.Run("Should allow clearing lease fields when transitioning claimed run back to queued", func(t *testing.T) { ++ t.Parallel() ++ // existing test body ++ }) + } + + func TestGlobalDBUpdateTaskRunRejectsActiveSessionClear(t *testing.T) { + t.Parallel() ++ t.Run("Should reject clearing session binding for active runs", func(t *testing.T) { ++ t.Parallel() ++ // existing test body ++ }) + } +``` +
    + +As per coding guidelines, "**MUST use t.Run("Should...") pattern for ALL test cases**." + + +Also applies to: 1114-1136 + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/store/globaldb/global_db_task_test.go` around lines 1070 - 1112, The +test body in TestGlobalDBUpdateTaskRunAllowsQueuedSessionRelease must be wrapped +as a named subtest per project policy; replace the current top-level assertions +with t.Run("Should release queued session when requeued", func(t *testing.T) { +t.Parallel(); /* move the current body here unchanged */ }) so the existing +setup, CreateTask/CreateTaskRun, UpdateTaskRun and assertions remain identical +but run as a subtest; do the same for the other new test referenced around lines +1114-1136 (wrap its body in a t.Run("Should ...", func(t *testing.T) { +t.Parallel(); ... })) so every test case uses the t.Run("Should...") pattern. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `TestGlobalDBUpdateTaskRunAllowsQueuedSessionRelease` and `TestGlobalDBUpdateTaskRunRejectsActiveSessionClear` are direct test bodies. Fix by wrapping each body in a named `Should ...` subtest while preserving the existing task/run setup and assertions. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_020.md b/.compozy/tasks/qa-rounds/reviews-001/issue_020.md new file mode 100644 index 000000000..a6efc1cb3 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_020.md @@ -0,0 +1,96 @@ +--- +status: resolved +file: internal/task/hooks.go +line: 127 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vT,comment:PRRC_kwDOR5y4QM67Z0NO +--- + +# Issue 020: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +
    +🧩 Analysis chain + +🏁 Script executed: + +```shell +#!/bin/bash +set -euo pipefail +rg -n --type go 'context\.Background\(\)' internal -g '!**/*_test.go' +``` + +Repository: compozy/agh + +Length of output: 11227 + +--- + + + +
    + +**Replace `context.Background()` with `context.TODO()` to comply with internal package guidelines.** + +This function violates the internal package rule against `context.Background()` outside `main` and focused tests. Since this is a fallback for a nil context in a non-main internal file, `context.TODO()` is the correct placeholder. + +
    +Proposed fix + +```diff + func taskRunObservationHookContext(ctx context.Context) context.Context { + if ctx == nil { +- return context.Background() ++ return context.TODO() + } + return context.WithoutCancel(ctx) + } +``` +
    + + + +
    +📝 Committable suggestion + +> ‼️ **IMPORTANT** +> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. + +```suggestion +func taskRunObservationHookContext(ctx context.Context) context.Context { + if ctx == nil { + return context.TODO() + } + return context.WithoutCancel(ctx) +} +``` + +
    + + + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/task/hooks.go` around lines 123 - 127, Replace the use of +context.Background() with context.TODO() in the taskRunObservationHookContext +function: when ctx is nil, return context.TODO() instead of context.Background() +so the function uses the internal package's preferred placeholder; keep the rest +of the function (including the call to context.WithoutCancel(ctx)) unchanged. +``` + +
    + + + + + +## Triage + +- Decision: `VALID` +- Notes: `taskRunObservationHookContext` uses `context.Background()` in an internal production package as a nil-context fallback. Fix by returning `context.TODO()` for the nil fallback and keeping `context.WithoutCancel(ctx)` for valid caller contexts. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_021.md b/.compozy/tasks/qa-rounds/reviews-001/issue_021.md new file mode 100644 index 000000000..68195a71d --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_021.md @@ -0,0 +1,54 @@ +--- +status: resolved +file: internal/task/hooks_test.go +line: 181 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vS,comment:PRRC_kwDOR5y4QM67Z0NN +--- + +# Issue 021: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Use `t.Run("Should...")` subtests for the newly added hook-context cases.** + +The coverage is valuable, but these two cases should be structured as `t.Run("Should...")` subtests to match the repository’s mandatory test pattern. + + + +As per coding guidelines, "**MUST use t.Run("Should...") pattern for ALL test cases**." + + +Also applies to: 183-224 + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/task/hooks_test.go` around lines 136 - 181, The test +TestTaskRunObservationHooksDetachFromCallerCancellation must be converted to use +t.Run subtests for the two hook-context assertions: wrap the +enqueued-cancel/assertion logic into a t.Run("Should keep enqueued hook context +active") subtest and the post-claim-cancel/assertion into a t.Run("Should keep +post-claim hook context active") subtest; keep the setup (store, manager with +WithTaskRunHooks, CreateTask, EnqueueRun/ClaimRun and their cancels) but move +the specific cancel+assertContextStillActive(enqueuedCtx, t, "enqueued") and +cancel+assertContextStillActive(postClaimCtx, t, "post-claim") calls into their +respective t.Run blocks so the file follows the required t.Run("Should...") +pattern while still referencing enqueuedCtx, postClaimCtx, +manager.EnqueueRun/ClaimRun and assertContextStillActive. +``` + +
    + + + + + +## Triage + +- Decision: `valid` +- Notes: `TestTaskRunObservationHooksDetachFromCallerCancellation` currently performs the enqueued and post-claim hook context assertions inline in the top-level test body. The repository's Go test convention requires explicit `t.Run("Should ...")` subtests for each case. The fix is to keep the shared manager/task setup intact and move each cancellation assertion into a named subtest. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_022.md b/.compozy/tasks/qa-rounds/reviews-001/issue_022.md new file mode 100644 index 000000000..0a61c7e49 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_022.md @@ -0,0 +1,95 @@ +--- +status: resolved +file: internal/task/manager_integration_test.go +line: 1153 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vU,comment:PRRC_kwDOR5y4QM67Z0NQ +--- + +# Issue 022: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Wrap this case in a `t.Run("Should ...")` subtest.** + +The new test is top-level only; this repo requires the explicit `Should...` subtest pattern. + +
    +Proposed fix + +```diff + func TestTaskManagerRecoverRunOnBootRequeuesBoundRunWithGlobalDB(t *testing.T) { + t.Parallel() ++ t.Run("Should requeue a bound run and clear session binding on boot recovery", func(t *testing.T) { ++ t.Parallel() + +- ctx := testutil.Context(t) +- db := openTaskManagerGlobalDB(t) +- manager := newTaskManagerIntegration(t, db) +- operator, err := taskpkg.DeriveHumanActorContext("operator", taskpkg.OriginKindCLI, "agh task run") +- if err != nil { +- t.Fatalf("DeriveHumanActorContext() error = %v", err) +- } +- agent, err := taskpkg.DeriveAgentSessionActorContext("sess-stale-boot") +- if err != nil { +- t.Fatalf("DeriveAgentSessionActorContext() error = %v", err) +- } +- daemon, err := taskpkg.DeriveDaemonActorContext("boot-recovery", "daemon.boot") +- if err != nil { +- t.Fatalf("DeriveDaemonActorContext() error = %v", err) +- } ++ ctx := testutil.Context(t) ++ db := openTaskManagerGlobalDB(t) ++ manager := newTaskManagerIntegration(t, db) ++ operator, err := taskpkg.DeriveHumanActorContext("operator", taskpkg.OriginKindCLI, "agh task run") ++ if err != nil { ++ t.Fatalf("DeriveHumanActorContext() error = %v", err) ++ } ++ agent, err := taskpkg.DeriveAgentSessionActorContext("sess-stale-boot") ++ if err != nil { ++ t.Fatalf("DeriveAgentSessionActorContext() error = %v", err) ++ } ++ daemon, err := taskpkg.DeriveDaemonActorContext("boot-recovery", "daemon.boot") ++ if err != nil { ++ t.Fatalf("DeriveDaemonActorContext() error = %v", err) ++ } + +- // ...existing assertions... ++ // ...existing assertions... ++ }) + } +``` +
    + + + +As per coding guidelines, `**/*_test.go`: "MUST use t.Run('Should...') pattern for ALL test cases". + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@internal/task/manager_integration_test.go` around lines 1091 - 1153, The test +function TestTaskManagerRecoverRunOnBootRequeuesBoundRunWithGlobalDB must be +wrapped in a t.Run("Should ...") subtest: replace the top-level t.Parallel() and +direct test body with a single t.Run("Should requeue bound run on boot and +release session binding", func(t *testing.T) { t.Parallel(); /* existing test +body */ }), keeping all existing setup and assertions intact (references: +TestTaskManagerRecoverRunOnBootRequeuesBoundRunWithGlobalDB, manager.CreateTask, +manager.EnqueueRun, manager.ClaimNextRun, manager.RecoverRunOnBoot, +db.GetTaskRun) so the repo-wide "Should..." subtest pattern is satisfied. +``` + +
    + + + + + +## Triage + +- Decision: `valid` +- Notes: `TestTaskManagerRecoverRunOnBootRequeuesBoundRunWithGlobalDB` has a top-level body with no `Should ...` subtest. The required test shape is to wrap the existing setup and assertions in one named subtest while preserving the integration flow and parallelism. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_023.md b/.compozy/tasks/qa-rounds/reviews-001/issue_023.md new file mode 100644 index 000000000..f101f9e49 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_023.md @@ -0,0 +1,21 @@ +--- +status: resolved +file: internal/workspace/resolver_test.go +line: 919 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:0865a291b1dd +review_hash: 0865a291b1dd +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 023: Consider hardening the deep-copy assertion by mutating cloned autonomy fields. +## Review Comment + +Current checks confirm values are copied, but not clone independence for this branch. Adding one mutation/assertion pair would future-proof this test. + +## Triage + +- Decision: `valid` +- Notes: `TestCloneConfigProducesDeepCopy` checks copied autonomy values but does not mutate the cloned autonomy branch and prove the original branch remains independent. The fix is to mutate cloned coordinator fields and assert the original coordinator configuration is unchanged. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_024.md b/.compozy/tasks/qa-rounds/reviews-001/issue_024.md new file mode 100644 index 000000000..e2cba8d5d --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_024.md @@ -0,0 +1,21 @@ +--- +status: resolved +file: packages/ui/src/components/kind-chip.test.tsx +line: 20 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:7d1cc452f926 +review_hash: 7d1cc452f926 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 024: Potential brittleness in style comparison. +## Review Comment + +`style.background` may return normalized values (e.g., `rgb()` format) depending on the test environment, while `KIND_DOT_COLORS.receipt` is `var(--color-success)`. This should work in jsdom since inline styles aren't computed, but be aware if tests become flaky after environment changes. + +## Triage + +- Decision: `valid` +- Notes: The current assertion compares `dot.style.background` directly to the token string. That happens to work in the current jsdom environment but couples the test to one CSSOM serialization path. The fix is to assert the inline style through the jest-dom style matcher, which is the existing test stack's style assertion API. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_025.md b/.compozy/tasks/qa-rounds/reviews-001/issue_025.md new file mode 100644 index 000000000..cfc8d97c4 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_025.md @@ -0,0 +1,21 @@ +--- +status: resolved +file: packages/ui/src/components/mono-badge.tsx +line: 20 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:cf60b7b49f37 +review_hash: cf60b7b49f37 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 025: Component docs are now slightly out of sync with supported tones. +## Review Comment + +`"solid-accent"` is a non-tinted variant, while the component docs still describe tinted usage only. Consider updating the comment to prevent confusion for consumers. + +## Triage + +- Decision: `valid` +- Notes: `MonoBadge` now supports a `solid-accent` tone, but its component comment still describes only tinted badges. The fix is a small documentation update in the component comment so consumers understand that `solid-accent` is intentionally non-tinted. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_026.md b/.compozy/tasks/qa-rounds/reviews-001/issue_026.md new file mode 100644 index 000000000..3ed071ae7 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_026.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: web/src/components/app-sidebar.tsx +line: 138 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:d83202abe6b3 +review_hash: d83202abe6b3 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 026: Consider extracting shared active-nav styles to avoid drift. +## Review Comment + +`NavItem` and `FooterSlot` now duplicate the same active container and indicator classes. A small local constant/helper would reduce maintenance risk. + +Also applies to: 484-494 + +## Triage + +- Decision: `valid` +- Notes: `NavItem` and the settings footer link duplicate the same active row and active indicator class strings. A local constant for the shared row, active row, and indicator classes removes the drift risk without changing runtime behavior. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_027.md b/.compozy/tasks/qa-rounds/reviews-001/issue_027.md new file mode 100644 index 000000000..7d8b309ee --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_027.md @@ -0,0 +1,53 @@ +--- +status: resolved +file: web/src/routes/_app/stories/-network.stories.tsx +line: 12 +author: coderabbitai[bot] +provider_ref: thread:PRRT_kwDOR5y4QM59r7vW,comment:PRRC_kwDOR5y4QM67Z0NS +--- + +# Issue 027: _⚠️ Potential issue_ | _🟠 Major_ +## Review Comment + +_⚠️ Potential issue_ | _🟠 Major_ + +**Use the network public barrel instead of a deep system import (Line 12).** + +Please route this through `@/systems/network` to preserve system boundaries from routes. + +
    +Suggested change + +```diff +-import { networkStatusFixture } from "@/systems/network/mocks"; ++import { networkStatusFixture } from "@/systems/network"; +``` +
    + + +As per coding guidelines, "Cross-system imports: Only through the public barrel (`@/systems/`). Never reach into another system's internals". + +
    +🤖 Prompt for AI Agents + +``` +Verify each finding against the current code and only fix it if needed. + +In `@web/src/routes/_app/stories/-network.stories.tsx` at line 12, Importing +networkStatusFixture directly from the internals violates cross-system import +rules; update the import in _network.stories.tsx to re-export +networkStatusFixture from the public barrel by changing the source to the +systems network public barrel (use "@/systems/network") so the story consumes +networkStatusFixture via the public API rather than the deep path. +``` + +
    + + + + + +## Triage + +- Decision: `valid` +- Notes: The route story imports `networkStatusFixture` from `@/systems/network/mocks`, which crosses into system internals from a route-level file. The fix is to stop importing the mock fixture from the route story and use the public `NetworkStatus` type from `@/systems/network` for a local story override object. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_028.md b/.compozy/tasks/qa-rounds/reviews-001/issue_028.md new file mode 100644 index 000000000..3664d6d0a --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_028.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: web/src/storybook/web-storybook-stories-and-fixtures.test.tsx +line: 23 +severity: minor +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:dc902e7f90ef +review_hash: dc902e7f90ef +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 028: Route system imports through a public barrel instead of deep internals. +## Review Comment + +Line 23 imports `@/systems/network/components/stories/network-workspace-shell.stories` directly from a system-internal path. Please expose a public test-facing entrypoint (or import via an allowed public barrel) so this test does not depend on network internals. + +As per coding guidelines, "Cross-system imports MUST only go through the public barrel (`@/systems/`). Never reach into another system's internals". + +## Triage + +- Decision: `valid` +- Notes: The Storybook regression test imports the network workspace shell story through `@/systems/network/components/stories/...`, which bypasses the system boundary. There is no existing public test-facing entrypoint for this story, so the minimal required out-of-scope code addition is `web/src/systems/network/storybook.ts`, a network-owned Storybook barrel that the regression test can import via `@/systems/network/storybook`. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_029.md b/.compozy/tasks/qa-rounds/reviews-001/issue_029.md new file mode 100644 index 000000000..1c62cefae --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_029.md @@ -0,0 +1,25 @@ +--- +status: resolved +file: web/src/systems/network/components/network-workspace-shell.tsx +line: 79 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:f4ce083715a2 +review_hash: f4ce083715a2 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 029: Avatar palette uses hardcoded hex values. +## Review Comment + +The `AVATAR_PALETTE` contains ad-hoc color tokens. Per coding guidelines, colors should be pulled from `DESIGN.md` rather than invented. Consider defining these as CSS variables in `tokens.css` or confirming they align with the design system's avatar color specification. + +As per coding guidelines: "Pull every color, font, radius, spacing step, and motion value from `DESIGN.md` — never invent tokens" + +--- + +## Triage + +- Decision: `valid` +- Notes: `AVATAR_PALETTE` uses literal hex colors in the component. AGH's design system already exposes the needed semantic tint and text variables, so the fix is to express avatar background/foreground pairs with existing CSS token variables rather than invented literals. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_030.md b/.compozy/tasks/qa-rounds/reviews-001/issue_030.md new file mode 100644 index 000000000..84dd68f8d --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_030.md @@ -0,0 +1,23 @@ +--- +status: resolved +file: web/src/systems/network/components/network-workspace-shell.tsx +line: 432 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:3dc982df56e7 +review_hash: 3dc982df56e7 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 030: Inline color value for hover state. +## Review Comment + +`hover:bg-white/[0.014]` uses an ad-hoc opacity value. Consider using a CSS variable-based hover surface token for consistency with the flat depth model. + +As per coding guidelines: "Tokens live in `packages/ui/src/tokens.css`; never override with ad-hoc hex values in components" + +## Triage + +- Decision: `valid` +- Notes: The message-row hover state uses `hover:bg-white/[0.014]`, an ad-hoc opacity value outside the token system. The fix is to use the existing hover surface token via `var(--color-hover)`. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_031.md b/.compozy/tasks/qa-rounds/reviews-001/issue_031.md new file mode 100644 index 000000000..00a9a00cc --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_031.md @@ -0,0 +1,21 @@ +--- +status: resolved +file: web/src/systems/network/components/stories/network-workspace-shell.stories.tsx +line: 56 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:9ca4d6bee709 +review_hash: 9ca4d6bee709 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 031: Unconventional use of globalThis.Error. +## Review Comment + +Using `globalThis.Error` instead of just `Error` is unusual. While it works, `Error` is globally available and more idiomatic. This may have been intentional to avoid linting rules, but worth noting for consistency. + +## Triage + +- Decision: `valid` +- Notes: `globalThis.Error` is unnecessary in the Storybook fixture helper because `Error` is already a global value in this TypeScript runtime. The fix is to use idiomatic `Error` directly. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_032.md b/.compozy/tasks/qa-rounds/reviews-001/issue_032.md new file mode 100644 index 000000000..fffe1b12c --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_032.md @@ -0,0 +1,21 @@ +--- +status: resolved +file: web/src/systems/network/mocks/handlers.ts +line: 127 +severity: minor +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:5c3c5d9e2376 +review_hash: 5c3c5d9e2376 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 032: Guard required fields with runtime type checks before .trim(). +## Review Comment + +At **Line 127**, non-string payload values can throw (e.g., `channel: 123`) before returning the intended 400 contract response. + +## Triage + +- Decision: `valid` +- Notes: The `/api/network/send` MSW handler calls `.trim()` through optional chaining on fields typed as optional strings, but malformed JSON can still provide non-string values and throw before the intended 400 response. The fix is to read required string fields through runtime type checks before trimming. diff --git a/.compozy/tasks/qa-rounds/reviews-001/issue_033.md b/.compozy/tasks/qa-rounds/reviews-001/issue_033.md new file mode 100644 index 000000000..fd5fbda24 --- /dev/null +++ b/.compozy/tasks/qa-rounds/reviews-001/issue_033.md @@ -0,0 +1,21 @@ +--- +status: resolved +file: web/src/systems/network/mocks/network-mocks.test.ts +line: 48 +severity: nitpick +author: coderabbitai[bot] +provider_ref: review:4177411700,nitpick_hash:e0bed60b6232 +review_hash: e0bed60b6232 +source_review_id: "4177411700" +source_review_submitted_at: "2026-04-26T20:24:27Z" +--- + +# Issue 033: Consider adding a failure-path contract test for required fields. +## Review Comment + +You already cover the happy path; adding one 400-case assertion would lock down the new validation behavior. + +## Triage + +- Decision: `valid` +- Notes: The send mock currently has only happy-path coverage. After hardening the handler's required-field parsing, add a 400-path contract test that sends malformed required fields and asserts the mock returns the intended validation response. diff --git a/.compozy/tasks/unified-capabilities/_techspec.md b/.compozy/tasks/unified-capabilities/_techspec.md index 0b8888160..6140d9cca 100644 --- a/.compozy/tasks/unified-capabilities/_techspec.md +++ b/.compozy/tasks/unified-capabilities/_techspec.md @@ -19,7 +19,7 @@ The implementation keeps the strongest parts of the current branch unchanged: lo - rich capability discovery in explicit `whois` - transferable capability envelopes through `kind: "capability"` -`docs/rfcs/003_agh-network-v0.md` must be rewritten so the protocol no longer speaks about `recipe` as a first-class artifact. `docs/agents/capabilities.md` remains the runtime-facing authoring guide, but it must be updated to document the unified schema and transfer semantics. +`docs/rfcs/003_agh-network-v0.md` must be rewritten so the protocol no longer speaks about `recipe` as a first-class artifact. `docs/rfcs/005_capability-catalogs-agent-directories.md` remains the runtime-facing authoring guide, but it must be updated to document the unified schema and transfer semantics. Data flow: - Runtime loads `AGENT.md` and optional capability catalogs from the agent directory. @@ -152,7 +152,7 @@ Internal integration points: - `internal/network/router.go` - `internal/network/lifecycle.go` - `docs/rfcs/003_agh-network-v0.md` -- `docs/agents/capabilities.md` +- `docs/rfcs/005_capability-catalogs-agent-directories.md` ## Impact Analysis diff --git a/.compozy/tasks/unified-capabilities/adrs/adr-002.md b/.compozy/tasks/unified-capabilities/adrs/adr-002.md index b61999b5e..f1443b5b8 100644 --- a/.compozy/tasks/unified-capabilities/adrs/adr-002.md +++ b/.compozy/tasks/unified-capabilities/adrs/adr-002.md @@ -78,4 +78,4 @@ Additional schema decisions: - [Agent capabilities TechSpec](../_techspec.md) - [/Users/pedronauck/Dev/compozy/agh2/internal/config/capabilities.go](/Users/pedronauck/Dev/compozy/agh2/internal/config/capabilities.go) -- [/Users/pedronauck/Dev/compozy/agh2/docs/agents/capabilities.md](/Users/pedronauck/Dev/compozy/agh2/docs/agents/capabilities.md) +- [/Users/pedronauck/Dev/compozy/agh/docs/rfcs/005_capability-catalogs-agent-directories.md](/Users/pedronauck/Dev/compozy/agh/docs/rfcs/005_capability-catalogs-agent-directories.md) diff --git a/.compozy/tasks/unified-capabilities/memory/task_05.md b/.compozy/tasks/unified-capabilities/memory/task_05.md index 945e56a0e..11c9dab4c 100644 --- a/.compozy/tasks/unified-capabilities/memory/task_05.md +++ b/.compozy/tasks/unified-capabilities/memory/task_05.md @@ -4,13 +4,13 @@ Keep only task-local execution context here. Do not duplicate facts that are obv ## Objective Snapshot -- Rewrite `docs/rfcs/003_agh-network-v0.md` and `docs/agents/capabilities.md` so the unified capability model is the only steady-state explanation of authoring, discovery, and transfer. +- Rewrite `docs/rfcs/003_agh-network-v0.md` and `docs/rfcs/005_capability-catalogs-agent-directories.md` so the unified capability model is the only steady-state explanation of authoring, discovery, and transfer. ## Important Decisions - Treat ADR-001 through ADR-003 plus task_04’s implemented discovery/API contracts as the source of truth, then word the docs against the current code rather than older split-model prose. - Keep the runtime guide explicit about the wire/API boundary: wire discovery still uses `peer_card.capabilities`, `agh.capabilities_brief`, and `agh.capability_catalog`, while daemon consumers should use typed `peer_card.capabilities` and `capability_catalog` payloads instead of reading capability discovery blobs from API-visible `ext`. -- Put the required end-to-end authored -> `greet` -> `whois` -> `kind:"capability"` flow in `docs/agents/capabilities.md`, and rewrite the RFC worked example so it no longer reintroduces a second artifact type. +- Put the required end-to-end authored -> `greet` -> `whois` -> `kind:"capability"` flow in `docs/rfcs/005_capability-catalogs-agent-directories.md`, and rewrite the RFC worked example so it no longer reintroduces a second artifact type. ## Learnings @@ -21,7 +21,7 @@ Keep only task-local execution context here. Do not duplicate facts that are obv ## Files / Surfaces - `docs/rfcs/003_agh-network-v0.md` -- `docs/agents/capabilities.md` +- `docs/rfcs/005_capability-catalogs-agent-directories.md` - `.compozy/tasks/unified-capabilities/task_05.md` - `.compozy/tasks/unified-capabilities/_tasks.md` diff --git a/.compozy/tasks/unified-capabilities/memory/task_08.md b/.compozy/tasks/unified-capabilities/memory/task_08.md index 36677ccd3..f798a0ff7 100644 --- a/.compozy/tasks/unified-capabilities/memory/task_08.md +++ b/.compozy/tasks/unified-capabilities/memory/task_08.md @@ -4,13 +4,13 @@ Keep only task-local execution context here. Do not duplicate facts that are obv ## Objective Snapshot -- Align `packages/site` runtime capability docs with the unified model defined in `_techspec.md`, `docs/agents/capabilities.md`, and ADRs 001/002. +- Align `packages/site` runtime capability docs with the unified model defined in `_techspec.md`, `docs/rfcs/005_capability-catalogs-agent-directories.md`, and ADRs 001/002. - Keep runtime pages operator-focused: authoring, projection, digest, and the three wire roles (brief, rich, transfer), without duplicating the full protocol reference. ## Important Decisions - Kept the runtime page operator-focused and linked to `protocol/capability-discovery` and `protocol/message-kinds/#capability` for the wire contract instead of restating envelope/validation rules. -- Added explicit `version`, `requirements`, and runtime-derived `digest` coverage to the runtime schema table and validation rules so site docs do not drift from `docs/agents/capabilities.md`. +- Added explicit `version`, `requirements`, and runtime-derived `digest` coverage to the runtime schema table and validation rules so site docs do not drift from `docs/rfcs/005_capability-catalogs-agent-directories.md`. - Left `packages/site/content/runtime/core/overview/what-is-agh.mdx` and `overview/architecture.mdx` untouched: both already use generic "capabilities" wording that is consistent with the unified model and never mention `recipe`. - Left `runtime/core/agents/meta.json` untouched; the page list and ordering still reflect the unified story. @@ -31,4 +31,4 @@ Keep only task-local execution context here. Do not duplicate facts that are obv ## Ready for Next Run -- Site runtime copy is consistent with `docs/agents/capabilities.md` and the accepted ADRs; task_09 (QA plan) can treat runtime + protocol site docs as a coherent pair. +- Site runtime copy is consistent with `docs/rfcs/005_capability-catalogs-agent-directories.md` and the accepted ADRs; task_09 (QA plan) can treat runtime + protocol site docs as a coherent pair. diff --git a/.compozy/tasks/unified-capabilities/memory/task_09.md b/.compozy/tasks/unified-capabilities/memory/task_09.md index 7b8f4ee56..e81313389 100644 --- a/.compozy/tasks/unified-capabilities/memory/task_09.md +++ b/.compozy/tasks/unified-capabilities/memory/task_09.md @@ -37,7 +37,7 @@ Keep only task-local execution context here. Do not duplicate facts that are obv - `.compozy/tasks/unified-capabilities/task_01.md` through `task_10.md` - `.compozy/tasks/unified-capabilities/adrs/adr-001.md` through `adr-003.md` - `docs/rfcs/003_agh-network-v0.md` -- `docs/agents/capabilities.md` +- `docs/rfcs/005_capability-catalogs-agent-directories.md` - `internal/api/contract/contract.go` - `web/src/systems/network/components/network-peer-detail-panel.tsx` - `web/src/hooks/routes/use-network-page.ts` diff --git a/.compozy/tasks/unified-capabilities/qa/test-cases/TC-REG-002.md b/.compozy/tasks/unified-capabilities/qa/test-cases/TC-REG-002.md index ba0f9f590..0108035b3 100644 --- a/.compozy/tasks/unified-capabilities/qa/test-cases/TC-REG-002.md +++ b/.compozy/tasks/unified-capabilities/qa/test-cases/TC-REG-002.md @@ -11,7 +11,7 @@ ### Objective -Verify that the runtime-facing documentation in `packages/site/content/runtime/core/agents/` stays aligned with `docs/agents/capabilities.md`: current layouts remain supported, `version` is optional, `digest` is runtime-computed, `requirements` reference `capability.id`, and no page teaches `recipe` as a separate authored/runtime concept. +Verify that the runtime-facing documentation in `packages/site/content/runtime/core/agents/` stays aligned with `docs/rfcs/005_capability-catalogs-agent-directories.md`: current layouts remain supported, `version` is optional, `digest` is runtime-computed, `requirements` reference `capability.id`, and no page teaches `recipe` as a separate authored/runtime concept. --- @@ -27,7 +27,7 @@ Verify that the runtime-facing documentation in `packages/site/content/runtime/c | Field | Value | Notes | | --- | --- | --- | | Runtime pages | `capabilities.mdx`, `definitions.mdx`, `meta.json` | Primary review set | -| Repo guide | `docs/agents/capabilities.md` | Runtime documentation source of truth | +| Repo guide | `docs/rfcs/005_capability-catalogs-agent-directories.md` | Runtime documentation source of truth | | Related runtime pages | `agent-md.mdx`, overview pages if needed | Used to confirm no drift in surrounding narrative | --- @@ -40,7 +40,7 @@ Verify that the runtime-facing documentation in `packages/site/content/runtime/c 2. Review `packages/site/content/runtime/core/agents/definitions.mdx` and related metadata. - **Expected:** Agent-definition docs describe the capability sidecar as the unified discovery/transfer artifact and do not imply a separate recipe concept. -3. Cross-check the runtime site wording against `docs/agents/capabilities.md`. +3. Cross-check the runtime site wording against `docs/rfcs/005_capability-catalogs-agent-directories.md`. - **Expected:** The site and repo guide agree on no-catalog behavior, typed API guidance, discovery roles, and transfer semantics. 4. Spot-check surrounding runtime pages only where they reference capability behavior. @@ -67,7 +67,7 @@ Verify that the runtime-facing documentation in `packages/site/content/runtime/c - Tasks: `task_08`, `task_05` - TechSpec: `System Architecture`, `Data Models`, `Technical Considerations` - ADRs: `ADR-001`, `ADR-002` -- Primary surfaces: `packages/site/content/runtime/core/agents/*`, `docs/agents/capabilities.md` +- Primary surfaces: `packages/site/content/runtime/core/agents/*`, `docs/rfcs/005_capability-catalogs-agent-directories.md` --- diff --git a/.compozy/tasks/unified-capabilities/qa/test-plans/unified-capabilities-test-plan.md b/.compozy/tasks/unified-capabilities/qa/test-plans/unified-capabilities-test-plan.md index d572352df..a831ac6ce 100644 --- a/.compozy/tasks/unified-capabilities/qa/test-plans/unified-capabilities-test-plan.md +++ b/.compozy/tasks/unified-capabilities/qa/test-plans/unified-capabilities-test-plan.md @@ -17,7 +17,7 @@ The plan intentionally avoids generic smoke coverage. Every P0 and P1 case in th 2. Prove `kind:"capability"` replaced `recipe` on the wire without regressing validation, delivery, or lifecycle semantics. 3. Prove brief discovery, rich discovery, peer details, and daemon API payloads expose one coherent typed capability model. 4. Prove the `web/` network surface renders unified capabilities from the typed backend contract without recipe-era assumptions. -5. Prove `packages/site` protocol and runtime docs teach the same single-concept model documented in RFC 003 and `docs/agents/capabilities.md`. +5. Prove `packages/site` protocol and runtime docs teach the same single-concept model documented in RFC 003 and `docs/rfcs/005_capability-catalogs-agent-directories.md`. 6. Leave task_10 with stable artifact paths for screenshots, issues, and final verification reporting. ## Scope diff --git a/.compozy/tasks/unified-capabilities/qa/verification-report.md b/.compozy/tasks/unified-capabilities/qa/verification-report.md index 4602a637a..f34ad4567 100644 --- a/.compozy/tasks/unified-capabilities/qa/verification-report.md +++ b/.compozy/tasks/unified-capabilities/qa/verification-report.md @@ -81,13 +81,13 @@ SCENARIO EVIDENCE - `bunx vitest run packages/ui/src/components/dialog.test.tsx` - `bun run --cwd web test:e2e:daemon-served:raw e2e/network.spec.ts` - `TC-REG-001` protocol docs consistency: - - `rg -n "\\brecipe(s)?\\b" packages/site/content/protocol packages/site/content/runtime/core/agents docs/agents/capabilities.md docs/rfcs/003_agh-network-v0.md` + - `rg -n "\\brecipe(s)?\\b" packages/site/content/protocol packages/site/content/runtime/core/agents docs/rfcs/005_capability-catalogs-agent-directories.md docs/rfcs/003_agh-network-v0.md` - `packages/site/content/protocol/meta.json` confirms there is no `recipes` page in steady-state nav - `make site-build` - `TC-REG-002` runtime docs consistency: - `packages/site/content/runtime/core/agents/meta.json` - `packages/site/content/runtime/core/agents/capabilities.mdx` keeps `recipe` only as a negated historical reference - - `docs/agents/capabilities.md` + - `docs/rfcs/005_capability-catalogs-agent-directories.md` - `make site-build` ADDITIONAL GATE REPAIRS OUTSIDE THE FEATURE REGRESSION LIST diff --git a/.compozy/tasks/unified-capabilities/task_05.md b/.compozy/tasks/unified-capabilities/task_05.md index 40f12a13e..22295df20 100644 --- a/.compozy/tasks/unified-capabilities/task_05.md +++ b/.compozy/tasks/unified-capabilities/task_05.md @@ -24,7 +24,7 @@ Rewrite the core repository docs so the unified capability model becomes the can - MUST rewrite `docs/rfcs/003_agh-network-v0.md` so `capability` is the only surviving authored and transferred artifact -- MUST update `docs/agents/capabilities.md` so local authoring, digesting, requirements, and transfer semantics reflect the unified model +- MUST update `docs/rfcs/005_capability-catalogs-agent-directories.md` so local authoring, digesting, requirements, and transfer semantics reflect the unified model - MUST describe the discovery split clearly: brief in `greet`, rich in `whois`, transfer in `kind:"capability"` - MUST remove or explicitly supersede protocol/runtime language that still presents `recipe` as a first-class concept - MUST keep terminology aligned with the accepted ADRs and the backend/API behavior finalized in task_04 @@ -43,7 +43,7 @@ See TechSpec "System Architecture", "Technical Considerations", and "Architectur ### Relevant Files - `docs/rfcs/003_agh-network-v0.md` - canonical protocol RFC that still needs the split model removed -- `docs/agents/capabilities.md` - runtime/operator-facing guide for capability authoring and behavior +- `docs/rfcs/005_capability-catalogs-agent-directories.md` - runtime/operator-facing guide for capability authoring and behavior - `.compozy/tasks/unified-capabilities/_techspec.md` - approved technical source for the new steady-state design - `.compozy/tasks/unified-capabilities/adrs/adr-001.md` - concept unification decision - `.compozy/tasks/unified-capabilities/adrs/adr-002.md` - authoring and schema decision diff --git a/.compozy/tasks/unified-capabilities/task_08.md b/.compozy/tasks/unified-capabilities/task_08.md index 6ef529f02..d14c37f52 100644 --- a/.compozy/tasks/unified-capabilities/task_08.md +++ b/.compozy/tasks/unified-capabilities/task_08.md @@ -17,7 +17,7 @@ Update the runtime-facing site docs so capability authoring, agent definitions, - ALWAYS READ `_techspec.md`, ADRs, and the rewritten runtime docs from task_05 before starting (`_prd.md` is absent for this feature) - REFERENCE TECHSPEC sections "System Architecture", "Data Models", and "Technical Considerations" - KEEP RUNTIME DOCS DISTINCT FROM PROTOCOL DOCS - this task should teach authoring and runtime behavior, not duplicate the full protocol reference -- ALIGN SITE RUNTIME WORDING WITH `docs/agents/capabilities.md` - the site must not fork its own capability story +- ALIGN SITE RUNTIME WORDING WITH `docs/rfcs/005_capability-catalogs-agent-directories.md` - the site must not fork its own capability story - TESTS REQUIRED - runtime pages and metadata must remain internally consistent after the rewrite - GREENFIELD: replace stale explanations directly instead of stacking warning callouts on obsolete copy @@ -27,7 +27,7 @@ Update the runtime-facing site docs so capability authoring, agent definitions, - MUST reflect the approved schema decisions: current local layouts remain, `version` is optional, `digest` is runtime-computed, and `requirements` targets `capability.id` - MUST update runtime overview or agent-definition pages that currently imply capabilities and recipes are separate concepts - MUST keep site runtime navigation metadata coherent after page rewrites -- MUST align examples and explanatory copy with `docs/agents/capabilities.md` and the finalized backend behavior +- MUST align examples and explanatory copy with `docs/rfcs/005_capability-catalogs-agent-directories.md` and the finalized backend behavior - SHOULD keep operator-facing explanations focused on authoring, discovery visibility, and runtime expectations rather than raw protocol detail @@ -47,7 +47,7 @@ See TechSpec "System Architecture", "Data Models", and task_05 outputs. This tas - `packages/site/content/runtime/core/configuration/agent-md.mdx` - configuration guide that may need updated capability catalog references - `packages/site/content/runtime/core/overview/what-is-agh.mdx` - runtime overview page that may still describe old concepts or flows - `packages/site/content/runtime/core/agents/meta.json` - runtime agents-section navigation metadata -- `docs/agents/capabilities.md` - rewritten repository guide that the site runtime copy must mirror +- `docs/rfcs/005_capability-catalogs-agent-directories.md` - rewritten repository guide that the site runtime copy must mirror ### Dependent Files - `packages/site/content/runtime/core/overview/architecture.mdx` - architecture page may need wording updates if it mentions network/runtime capability concepts @@ -63,7 +63,7 @@ See TechSpec "System Architecture", "Data Models", and task_05 outputs. This tas - Updated runtime site docs for capability authoring and behavior - Runtime examples aligned with the finalized unified schema **(REQUIRED)** - Navigation and metadata updates for any touched runtime pages **(REQUIRED)** -- Consistency checks against `docs/agents/capabilities.md` and the ADRs **(REQUIRED)** +- Consistency checks against `docs/rfcs/005_capability-catalogs-agent-directories.md` and the ADRs **(REQUIRED)** - Documentation quality checks with no conflicting capability/recipe explanations left in runtime-facing site docs **(REQUIRED)** ## Tests @@ -73,7 +73,7 @@ See TechSpec "System Architecture", "Data Models", and task_05 outputs. This tas - [ ] Agent-definition and overview pages no longer imply recipe is a separate authored/runtime primitive - [ ] Runtime metadata files remain valid after any page rewrites or nav changes - Integration tests: - - [ ] The runtime site section reads consistently with `docs/agents/capabilities.md` and task_05 outputs + - [ ] The runtime site section reads consistently with `docs/rfcs/005_capability-catalogs-agent-directories.md` and task_05 outputs - [ ] Capability-vs-skill explanations remain clear without reintroducing the old capability/recipe split - Test coverage target: >=80% - All tests must pass diff --git a/AGENTS.md b/AGENTS.md index 014951184..b1e3adc86 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,8 +1,6 @@ -# CLAUDE.md - ## Project Overview -AGH is an Agent Operating System — a Go single-binary daemon that manages AI agent sessions via ACP (Agent Client Protocol). It spawns ACP-compatible agents (Claude Code, Codex, Gemini CLI, etc.) as subprocesses, communicates via JSON-RPC over stdio, persists events in SQLite, and exposes interfaces via HTTP/SSE (web UI) and UDS (CLI). A Fumadocs site at `agh.compozy.com` documents the runtime and the AGH Network protocol. +AGH is an Agent Operating System — a Go single-binary daemon that manages AI agent sessions via ACP (Agent Client Protocol). It spawns ACP-compatible agents (Claude Code, Codex, Gemini CLI, etc.) as subprocesses, communicates via JSON-RPC over stdio, persists events in SQLite, and exposes interfaces via HTTP/SSE (web UI) and UDS (CLI). A Fumadocs site at `agh.network` documents the runtime and the AGH Network protocol. **Goals**: daemon single-binary in background, strong observability, agent-first system (agents manipulate via CLI + REST), highly extensible, highly configurable. @@ -10,9 +8,13 @@ AGH is an Agent Operating System — a Go single-binary daemon that manages AI a ## Greenfield Alpha — Zero Legacy Tolerance -No production users exist. Never sacrifice code quality for backward compatibility. Never write migration, compat, or defensive code for old state — delete the old thing instead of working around it. - -**Hard cuts, not bridges.** Renames sweep code, storage, APIs, CLI, extensions, specs, RFCs, AND `.compozy/tasks/*` artifacts in the same change. No aliases, no dual fields, no schema fallback paths. Every breaking-change techspec MUST explicitly name its delete targets. +- **No production users exist.** +- Never sacrifice code quality for backward compatibility. +- Never write migration, compatibility, or defensive code for old state — delete obsolete code instead of working around it. +- **Hard cuts, not bridges:** + - Renames must update code, storage, APIs, CLI, extensions, specs, RFCs, and `.compozy/tasks/*` artifacts all in a single change. + - Do not create aliases, dual fields, or schema fallback paths. +- Every breaking-change techspec **MUST** explicitly list its delete targets. ## Critical Rules @@ -23,8 +25,10 @@ No production users exist. Never sacrifice code quality for backward compatibili - **Never use web search tools for local project code** — use Grep/Glob instead. Web search is only for external docs. - **Never run destructive git commands** (`git restore`, `git checkout`, `git reset`, `git clean`, `git rm`) **without explicit user permission**. If the worktree contains unexpected edits, read and work around them. - NEVER ignore errors with `_` in production code or in tests — every error must be handled or have a written justification. -- NEVER COMMITS `ai-docs/`, `.tmp/`, or `.compozy/tasks/*/memory/` TO THE REPO. They are local tracking artifacts. +- NEVER COMMITS `ai-docs/` or `.tmp/` TO THE REPO. They are local tracking artifacts. - **Subagents are read-only.** Use them for analysis, exploration, and parallel research. The author of every code change is the agent paired with the user. Subagent output is treated as evidence, not as committed work. +- **ALWAYS CHECK** the `internal/CLAUDE.md` when doing Go-related stuff +- **ALWAYS CHECK** the `web/CLAUDE.md` when doing things related to the web package ## Workflow Rules @@ -55,38 +59,43 @@ These govern how features move from idea to ship. Internalize them before openin Activate skills **before** writing code. Match task domain → activate all required skills: -| Domain | Required Skills | Conditional Skills | -| -------------------------- | ---------------------------------------------------------------------------------------- | -------------------------------------- | -| Go / Runtime | `golang-pro` | `context7` | -| Config / Logging | `golang-pro` | | -| Bug fix | `systematic-debugging` + `no-workarounds` | `testing-anti-patterns` | -| Writing Go tests | `testing-anti-patterns` + `agh-test-conventions` + `golang-pro` | `vitest` (only for test tooling docs) | -| Cleanup / failure paths | `agh-cleanup-failure-paths` + `golang-pro` | `deadlock-finder-and-fixer` | -| Schema / migration changes | `agh-schema-migration` + `golang-pro` | | -| Contract / OpenAPI changes | `agh-contract-codegen-coship` | | -| Task completion | `cy-final-verify` | | -| Architecture audit | `architectural-analysis` | `refactoring-analysis` + `ubs` | -| Concurrency / races | `deadlock-finder-and-fixer` + `golang-pro` | `systematic-debugging` | -| Performance / hot paths | `extreme-software-optimization` + `golang-pro` | | -| Security review | `security-review` | `ubs` | -| Creative / new features | `brainstorming` | `cy-idea-factory` | -| PRD creation | `cy-spec-preflight` + `cy-create-prd` | `cy-idea-factory` | -| TechSpec creation | `cy-spec-preflight` + `cy-create-techspec` + `cy-spec-peer-review` | `cy-research-competitors` | -| Task generation | `cy-spec-preflight` + `cy-create-tasks` + `cy-tasks-tail-qa-pair` + `cy-web-docs-impact` | | -| Competitor research | `cy-research-competitors` | `context7` + `find-docs` | -| Execute a PRD task | `cy-execute-task` | `cy-workflow-memory` | -| Review round / fixes | `cy-review-round` + `cy-fix-reviews` | `fix-coderabbit-review` | -| Release / scenario QA | `real-scenario-qa` (delegates to `qa-execution` + `qa-report`) | `agh-worktree-isolation` | -| Git rebase / conflicts | `git-rebase` | | -| External docs lookup | `context7` + `find-docs` | `exa-web-search-free` | -| UI / Design (any surface) | `agh-design` + `design-taste-frontend` + `minimalist-ui` | `frontend-design` + `interface-design` | +| Domain | Required Skills | Conditional Skills | +| ------------------------------------- | ---------------------------------------------------------------------------------------- | -------------------------------------- | +| Go / Runtime | `agh-code-guidelines` + `golang-pro` | `context7` | +| Config / Logging | `agh-code-guidelines` + `golang-pro` | | +| TUI / CLI Bubbletea | `bubbletea` + `agh-code-guidelines` + `golang-pro` | | +| Bug fix | `systematic-debugging` + `no-workarounds` | `testing-anti-patterns` | +| Writing Go tests | `agh-test-conventions` + `testing-anti-patterns` + `golang-pro` | `vitest` (only for test tooling docs) | +| Cleanup / failure paths | `agh-cleanup-failure-paths` + `agh-code-guidelines` + `golang-pro` | `deadlock-finder-and-fixer` | +| Schema / migration changes | `agh-schema-migration` + `golang-pro` | | +| Contract / OpenAPI changes | `agh-contract-codegen-coship` | | +| Task completion | `cy-final-verify` | | +| Lessons learned | `lesson-learned` | | +| Architecture audit | `architectural-analysis` | `refactoring-analysis` + `ubs` | +| Concurrency / races | `deadlock-finder-and-fixer` + `golang-pro` | `systematic-debugging` | +| AGH Network (`internal/network` only) | `nats` + `agh-code-guidelines` + `golang-pro` | `deadlock-finder-and-fixer` | +| Performance / hot paths | `extreme-software-optimization` + `golang-pro` | | +| Security review | `security-review` | `ubs` | +| Creative / new features | `brainstorming` | `cy-idea-factory` | +| Council debate (high-impact) | `council` | `brainstorming` | +| PRD creation | `cy-spec-preflight` + `cy-create-prd` | `cy-idea-factory` | +| TechSpec creation | `cy-spec-preflight` + `cy-create-techspec` + `cy-spec-peer-review` | `cy-research-competitors` | +| Task generation | `cy-spec-preflight` + `cy-create-tasks` + `cy-tasks-tail-qa-pair` + `cy-web-docs-impact` | | +| Competitor research | `cy-research-competitors` | `context7` + `find-docs` | +| Execute a PRD task | `cy-execute-task` | `cy-workflow-memory` | +| Review round / fixes | `cy-review-round` + `cy-fix-reviews` | `fix-coderabbit-review` | +| Release / scenario QA | `real-scenario-qa` (delegates to `qa-execution` + `qa-report`) | `agh-worktree-isolation` | +| Git rebase / conflicts | `git-rebase` | | +| External docs lookup | `context7` + `find-docs` | `exa-web-search-free` | +| Diagrams (spec / ADR) | `architecture-diagram` | `mermaid-diagrams` | +| Documentation (internal) | `documentation-writer` | `crafting-effective-readmes` | +| Skill / agent-md authoring | `skill-best-practices` + `agent-md-refactor` | | +| UI / Design (any surface) | `agh-design` + `design-taste-frontend` + `minimalist-ui` | `frontend-design` + `interface-design` | Web-specific skill dispatch is in `web/CLAUDE.md` and `web/AGENTS.md`. Site-specific dispatch is in `packages/site/CLAUDE.md`. Every domain change requires its skill — no skipping "because it's a small change". Activate multiple skills when code touches multiple domains. -`nats` skill is installed but architecturally forbidden in AGH (see Architecture Principles). Do not activate it. - ## Build Commands ### Go (backend) @@ -121,11 +130,17 @@ make cli-docs # Regenerate CLI reference from cobra JSON export Web (`web/`) commands are documented in `web/CLAUDE.md`. -## Commit style: : - -Allowed prefixes: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`, `build:`. **NO `chore:`, `style:`, or `ci:`.** Tooling and CI changes use `build:`. PR-merged commits include `(#NN)` suffix. +## Commit style -**One commit per remediation batch.** `cy-fix-reviews` rounds produce exactly one local commit per round. Run `make verify` BEFORE and AFTER the commit. Never `git commit --amend` after pre-commit hook failures — fix and create a new commit. +- ALWAYS USE: `: ` +- Allowed commit prefixes: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`, `build:` +- **Do NOT use**: `chore:`, `style:`, or `ci:`. +- Use `build:` for tooling and CI changes. +- For PR-merged commits, append a `(#NN)` suffix. +- **Create exactly one commit per remediation batch.** +- Each `cy-fix-reviews` round must produce one local commit. +- Always run `make verify` **before and after** committing. +- If a pre-commit hook fails, do **not** use `git commit --amend`. Instead, fix the issue and create a new commit. ## Code Search Hierarchy @@ -133,233 +148,83 @@ Allowed prefixes: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`, `build:`. **NO 2. **`context7` / `find-docs` skills** — for external library documentation. 3. **`exa-web-search-free`** — for web research, news, external code examples when the local docs tools are insufficient. -## Old Project Reference - -The `.old_project/` directory contains the previous AGH implementation (78K+ LOC). **Reference only** — do not modify, do not import, do not include in builds. Exclude from code search results. - -## Architecture - -### Principles - -- **Designed for incremental extension** — new capabilities arrive as new packages wired into `daemon/`, without modifying existing packages. Small interfaces + dependency injection. Every capability plan decides which extension points, hooks, capabilities, tools/resources, bundles, registries, bridge SDKs, and docs must be added, updated, or removed. -- **Pragmatic Flat with Discipline** — packages under `internal/`, API transports grouped under `api/`, no domain/infra split, no event bus. -- **`daemon/` is the sole composition root** — the only package that imports all others. Reconciliation logic running at boot belongs to composition root and is not "legacy support". -- **No package imports `daemon/`, `api/`, or `cli/`** — dependencies flow downward only. -- **Interfaces defined where consumed** (Go-style) — `session/` defines `AgentDriver`, `acp/` implements it. -- **Direct function calls through interfaces** — no event bus, no NATS, no reflection-based routing. -- **Notifier pattern for fan-out** — typed interface for observability and SSE, not a generic bus. -- **No back-pointers between packages** — inject callbacks or interfaces. -- **Functional options for constructors** — `NewManager(opts ...Option)`. -- **Maps for <10 items** — no registry interfaces for small collections. -- **File-level organization** within packages — sub-packages only when complexity justifies it. -- **CI-enforceable boundaries** — `mage Boundaries` rules prevent import cycles. Update `magefile.go` Boundaries() in the same commit that introduces a new `internal/api/*` subpackage. -- **`internal/api/core` is the canonical handler home.** REST/UDS endpoints exist as shared `BaseHandlers` methods; HTTP and UDS only choose registration and authentication. No transport-duplicated parsing/validation. -- **Authoritative primitives are exclusive.** When a primitive owns a state transition (`task.Service.ClaimNextRun`, `Spawn`, `EnsureMigration`), no peer package may replicate it. Wake/observe/sweep are allowed; claim/own is not. The mechanical scheduler does not call `ClaimNextRun`. -- **Hooks are typed dispatch, not an event bus.** Dispatch at the call site that owns the state transition. Never tail event/log tables to fire hooks. Hooks may deny/narrow/annotate but cannot bypass safety primitives (claim tokens, leases, TTL, lineage, spawn caps, permission narrowing). -- **Agent-manageable by default.** User-visible runtime capabilities must expose stable machine-readable control surfaces for agents: CLI verbs with `-o json`/`-o jsonl` where relevant, HTTP/UDS parity when state crosses the daemon boundary, discoverable status/config output, and docs that describe the agent path. UI-only manageability is incomplete. -- **No partial-surface completions.** Any change touching a public surface closes the loop end-to-end in one pass: contract → HTTP handler → UDS handler → CLI client → CLI command → extension/config/docs surfaces → tests → docs. - -### Concurrency - -- Every goroutine must have explicit ownership and shutdown via `context.Context` cancellation. -- No fire-and-forget goroutines — track with `sync.WaitGroup` or equivalent. -- Use `select` with `ctx.Done()` in all long-running goroutine loops. -- Prefer channels over shared memory with mutexes when practical. -- `sync.RWMutex` for read-heavy, `sync.Mutex` for write-heavy shared state. -- No `time.Sleep()` in orchestration — use proper synchronization primitives. -- **Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by Manager-owned WaitGroup and joined in Manager shutdown.** Never put goroutine-owned channels in a struct field that another goroutine mutates — use a per-run handle. -- **Detached execution lifetime.** Any work that outlives an HTTP/UDS request — prompts, network channel sends, automation jobs — MUST detach via `context.WithoutCancel(ctx)`. Never tie execution lifetime to request lifetime. Expose explicit cancel endpoints (e.g., `POST /api/sessions/:id/prompt/cancel`). -- **`context.WithoutCancel` does NOT preserve deadlines.** Re-attach a deadline if needed. -- **Subprocess managed-stop** must respect `ctx.Done()` between Shutdown and Wait. Wrap `proc.Wait()` in `select { case <-proc.Done(): case <-ctx.Done(): }`. -- **Process-group supervision parity.** Unix uses process groups; Windows uses forced-exit fallback. Always cross-build with `GOOS=windows GOARCH=amd64 go build` before claiming subprocess work complete. Centralize signaling helpers in `internal/procutil`. - -### Runtime - -- Single-binary and local-first. Sidecars or external control planes require a written techspec. -- Keep execution paths deterministic and observable. -- **Daemon runs in background by default.** No daemon should require a foreground terminal. -- **`compozy exec` is headless.** `--format text` returns a single string; `--format json` returns a stream of valid JSON objects; the TUI is opt-in via `--tui`. `exec` does not persist artifacts to `.compozy/runs/` unless `--persist` is given. -- **Agent operations must not depend on the web UI.** If agents need to inspect, configure, start, stop, approve, claim, release, or repair a capability, the spec must provide a CLI/HTTP/UDS path with structured output and deterministic errors. - -### Observability - -- Every domain operation emits a canonical event with correlation keys (`workspace_id`, `session_id`, `parent_session_id`, `root_session_id`, `agent_name`, `task_id`, `run_id`, `claim_token_hash`, `lease_until`, `workflow_id`, `coordinator_session_id`, `scheduler_reason`, `hook_event`, `hook_name`, `spawn_depth`, `actor_kind`, `actor_id`, `release_reason`). -- Cover with a coverage matrix test that fails if any required lifecycle path doesn't emit its canonical event. -- Append-only event store (`runtime.db`) is the canonical operational ledger; session DBs are projections, not authority. -- Live broadcasters publish only after durable append; reconnect/replay uses `after_seq`. - -## Autonomy Contracts - -These are load-bearing rules from the autonomous-mode ADRs (`.compozy/tasks/autonomous/adrs/adr-001..012`) and `_techspec.md`. Internalize them before touching the kernel. - -- **`task_runs` is the single durable work queue.** Do not introduce a parallel queue or actor table. Add new ownership/state via columns + side tables on `task_runs`. -- **`task.Service.ClaimNextRun` is the canonical claim primitive.** Lease invariants: exactly one active claim token per non-terminal run; heartbeat/complete/fail/release compare run owner + claim token; stale/late after recovery fails explicitly; sweep + heartbeat serialize via SQLite tx; boot recovery before scheduler accepts wake/claim traffic; lease extension bounded by config; one active lease per session in MVP. Use `BEGIN IMMEDIATE`; CAS predicates for sweep. -- **Capability matching = durable exact-match rows** in `task_run_required_capabilities` / `task_run_preferred_capabilities`, NOT JSON metadata. -- **Manual operator paths and autonomous paths converge on the same primitives.** User-created, automation-created, coordinator-created, and agent-spawned tasks all use the same task/run model and the same claim-token/lease/heartbeat/complete/fail/release rules. Task creation alone NEVER enqueues claimable work or starts the coordinator. Publish/start/approval is the run-enqueue boundary. -- **Coordinator auto-spawn** triggers ONLY when: workspace has no healthy active coordinator AND a coordinated run is enqueued by publish/start/approval AND run has stable `coordination_channel_id` AND auto-start enabled AND spawn caps allow. Conservative defaults (auto-start disabled, max-children 5, max-active-per-workspace 1). -- **Coordinator-agent owns semantic orchestration; mechanical scheduler owns operational safety** (idle registry, capability-aware wakeups, lease sweep, recovery, backpressure). The scheduler does NOT call `ClaimNextRun` directly in MVP. -- **Safe spawn defaults**: max-depth 1, max-children 5, mandatory TTL on every spawned session; children auto-stop with parent. Permission narrowing compares concrete atoms only (tools, skills, MCP server IDs, workspace path grants, network channels, env profile grants); subset-only; unknown child atoms count as widening and reject. Daemon NEVER silently narrows. -- **Hook taxonomy** (MVP allowlist): `coordinator.{pre_spawn,spawned,decision,stopped,failed}`, `task.run.{enqueued,pre_claim,post_claim,lease_extended,lease_expired,lease_recovered,released}`, `spawn.{pre_create,created,parent_stopped,ttl_expired,reaped}`, plus `tool.*`, `permission.*`, `session.*`. Scheduler wake/no-match/recovery stay internal metrics. No `workflow.*` umbrella in MVP. -- **Coordination channels.** Every workspace-scoped coordinated run has ONE durable `coordination_channel_id` on `task_runs`. Bind always, speak when useful — heartbeats/lease transitions never mirror as chat. Network message kinds limited to `status` / `request` / `reply` / `blocker` / `handoff` / `result` / `review_request` in MVP. Channels are NEVER an ownership/status authority. -- **Generated contracts and docs co-ship.** Any change to `internal/api/contract` co-ships in the same PR with: regen of `openapi/agh.json` and `web/src/generated/agh-openapi.d.ts`, updates to `web/src/systems/*/types.ts` consumers, Storybook/MSW fixtures, and passes `make codegen-check`, `make web-typecheck`, AND `make web-test`. -- **Agent-facing CLI is identity-inferred.** Caller identity flows from `AGH_SESSION_ID` / `AGH_AGENT` through `internal/agentidentity`. Operator endpoints MUST NOT infer agent identity from environment variables. Stable `-o json` and `-o jsonl` are compatibility contracts; no command aliases (no `done`, no `pass`). - -## Security Invariants - -- **`claim_token` redaction is non-negotiable.** Raw `claim_token` (`agh_claim_*`), MCP auth tokens, OAuth codes, PKCE verifiers, and secret bindings MUST NEVER appear in logs, status APIs, settings views, error payloads, channel messages, SSE, web UI, or memory. Use hash forms (`claim_token_hash`) over the wire. Network layer rejects raw `claim_token` in metadata. -- **Symlink escape hardening.** Skill sidecars, skill files, managed-extension dependency copies, and bundle install paths MUST verify resolved targets remain inside approved roots. Use `EvalSymlinks` + path-prefix check, not naive joins. Handle macOS `/private/var/folders` quirk (canonicalize source root before containment check). -- **Path security helpers.** Filesystem helpers resolving user-controlled or agent-controlled paths use the `sanitizePathKey` + `realpathDeepestExisting` pattern (defenses against null-byte, URL-encoded traversal, Unicode normalization, symlink-escape). -- **Identity proof-stripping defense.** In any signed-message processing path (AGH Network v1), an identity in verified format (`nickname@fingerprint`) without valid `proof` MUST classify as `rejected`, not `unverified`. -- **External-call timeouts.** Outbound HTTP/network calls MUST use a client with an explicit timeout. `http.DefaultClient` is forbidden in production code paths. -- **Load-time security scan.** Every non-bundled skill is scanned via `internal/skills.VerifyContent` on every load (not just install). Critical findings block; warning findings log; info findings log silently. Bundled skills are exempt because `go:embed` provides immutability. - -## Package Layout - -| Path | Responsibility | -| ------------------------------- | ----------------------------------------------------------------------------- | -| `cmd/agh` | Main entry point, CLI binary | -| `internal/config` | TOML loading, validation, merge, home paths, agent def parsing | -| `internal/acp` | ACP client: subprocess spawn, JSON-RPC over stdio | -| `internal/agentidentity` | Caller-identity inference from `AGH_SESSION_ID`/`AGH_AGENT` | -| `internal/automation` | Cron, webhook, and scheduled triggers; durable scheduler state | -| `internal/bridges` | External messaging adapters (Slack, Telegram, etc.) | -| `internal/bridgesdk` | Bridge SDK / contract types | -| `internal/bundles` | Bundle activation projector | -| `internal/cli` | Cobra commands | -| `internal/codegen` | OpenAPI → TS generator helpers | -| `internal/coordinator` | Coordinator-agent bootstrap and lifecycle | -| `internal/daemon` | Composition root, lock, boot, shutdown | -| `internal/diagnostics` | Diagnostics + health probes | -| `internal/e2elane` | E2E lane harness wiring | -| `internal/environment` | Env-profile resolution | -| `internal/extension` | Extension manifest, registry, host API, install runtime | -| `internal/extensiontest` | Extension test harness | -| `internal/filesnap` | File snapshot utilities | -| `internal/fileutil` | Shared filesystem helpers | -| `internal/frontmatter` | YAML frontmatter parsing | -| `internal/hooks` | Typed hook taxonomy + dispatch | -| `internal/logger` | Structured logging (slog) | -| `internal/mcp` | MCP server lifecycle / sidecars | -| `internal/memory` | Persistent dual-scope memory (global + workspace + agent), provenance, recall | -| `internal/memory/consolidation` | Dream consolidation runtime (Time → Sessions → Lock gate cascade) | -| `internal/network` | AGH Network channels/peers/wire, NATS profile | -| `internal/observe` | Event recording, health metrics, query engine | -| `internal/procutil` | Process utilities, process-group signaling, Windows fallback | -| `internal/registry` | Skill/agent/capability registry helpers | -| `internal/resources` | Resource projector / codec / validate | -| `internal/retry` | Retry primitives | -| `internal/scheduler` | Mechanical scheduler (idle registry, wakeups, sweep, recovery) | -| `internal/session` | Session lifecycle, Manager, state machine | -| `internal/settings` | Settings overlay/projection | -| `internal/situation` | Situation surface providers (`/agent/context`) | -| `internal/skills` | Skills catalog, loader, `VerifyContent`, MCP/hook decl, provenance | -| `internal/skills/bundled` | Bundled skill definitions | -| `internal/sse` | Shared SSE helpers | -| `internal/store` | SQLite shared helpers, migrations registry, validation | -| `internal/store/globaldb` | Global catalog (`agh.db`): sessions, metadata | -| `internal/store/sessiondb` | Per-session event store (`events.db`) | -| `internal/subprocess` | Subprocess signaling primitives | -| `internal/task` | Task domain, `task_runs` ownership, `ClaimNextRun` | -| `internal/testutil` | Shared test helpers | -| `internal/api/contract` | Shared daemon/CLI/HTTP contract types | -| `internal/api/core` | Shared handler types (`BaseHandlers`), error mapping, SSE helpers | -| `internal/api/httpapi` | HTTP/SSE server (Gin) for web UI | -| `internal/api/udsapi` | UDS server for CLI IPC | -| `internal/api/testutil` | Test helpers for the API layer | -| `internal/toolruntime` | Tool process registry + interrupts | -| `internal/tools` | Tool definitions and dispatch | -| `internal/transcript` | Canonical replay message assembly from persisted events | -| `internal/version` | Build metadata | -| `internal/workref` | Work reference helpers | -| `internal/workspace` | Workspace resolver and entity management | -| `web/` | React 19 SPA (Vite, TanStack Router/Query, Tailwind, shadcn) | -| `web/src/systems/` | Domain feature modules (app-renderer-systems pattern) | -| `packages/site` | Fumadocs documentation site (Bun) | -| `packages/ui` | Shared UI primitives (`@agh/ui`) | +## Surface Map + +Repo layout. Each surface owns its instructions: + +| Path | Stack | Instructions | +| --------------- | ----------------------------------------------------------------------- | ------------------------- | +| `cmd/agh` | Go binary entry point | `internal/CLAUDE.md` | +| `internal/` | Go runtime daemon (ACP, SQLite, autonomy kernel, HTTP/UDS, network) | `internal/CLAUDE.md` | +| `web/` | React 19 SPA (Vite, TanStack, Tailwind, shadcn) | `web/CLAUDE.md` | +| `packages/site` | Fumadocs documentation site (Bun) | `packages/site/CLAUDE.md` | +| `packages/ui` | Shared UI primitives (`@agh/ui`) consumed by `web/` and `packages/site` | `web/CLAUDE.md` | + +Backend architecture, autonomy contracts, security invariants, package layout, and `internal/`-specific debugging now live in **`internal/CLAUDE.md`**. Open it before touching any Go code under `cmd/` or `internal/`. ## Coding Style -- Explicit error returns with wrapped context: `fmt.Errorf("context: %w", err)`. -- Use `errors.Is()` and `errors.As()` exclusively for error matching. **`strings.Contains(err.Error(), …)` is forbidden.** -- Never ignore errors with `_` — every error must be handled or have a written justification. -- **Cleanup paths must cancel contexts and release resources.** Every error-return path that previously created or extended a `context.Context`, registered a resource, opened a connection, or spawned a subprocess MUST `cancel()`, `Close()`, `Stop()`, or release its lease on the error path. Pair `defer cancel()` immediately after `WithCancel`/`WithTimeout`. -- No `panic()` or `log.Fatal()` in production paths — only for truly unrecoverable startup failures. -- `log/slog` for structured logging — no `log.Printf` or `fmt.Println` for operational output. -- `context.Context` as first argument to functions crossing runtime boundaries — avoid `context.Background()` outside `main` and focused tests. -- **Compile-time interface verification is mandatory.** `var _ Interface = (*Type)(nil)` next to every new exported type that satisfies an interface. -- No `interface{}`/`any` when a concrete type is known. -- No reflection without performance justification. -- **Never hardcode configuration** — use TOML config or functional options. Disable/zero-value semantics must be explicit. Resolution chains documented as ordered fallbacks ending in actionable errors. -- **Config lifecycle is part of the feature lifecycle.** Any spec that adds, updates, removes, or stops needing configuration must update structs, defaults, merge/overlay behavior, validation, examples, `config.toml` docs, generated CLI/site docs, and tests in the same change. If no config change is needed, the TechSpec says why. -- **CLI flag presence detection.** Distinguish "flag not set" from "flag set to zero value" via `cmd.Flags().Changed(name)` (Cobra) or equivalent. Silently ignoring an explicit flag is a bug. -- **Whitespace normalization at CLI boundary.** String-slice CLI inputs (capabilities, IDs, tags, paths) MUST trim and drop empty entries before sending. Do not push whitespace-only strings to the daemon as "validation problems". -- **No defensive nil-checks after `make`.** Reviewers and lint flag `if x == nil` after `make(...)` as unreachable. -- **No comments restating WHAT the code does.** Comments capture WHY when non-obvious — hidden constraints, invariants, workarounds for specific bugs. Don't reference the current task or callers ("used by X", "added for Y") — those rot. +- **Skill**: `agh-code-guidelines` (`.agents/skills/agh-code-guidelines/`). +- **When**: before writing or editing any production `*.go` file under `cmd/` or `internal/`. +- **Covers**: error wrapping (`%w`), `errors.Is`/`As` only, `slog` logging, `context.Context` discipline, compile-time interface assertions, no hardcoded config, CLI flag presence detection, comments policy, generic concurrency patterns. +- **Top-level invariants restated in Critical Rules**: no `_`-discarded errors, `make verify` must pass, `make lint` zero tolerance. ## Testing -- **Every Go test case MUST be inside a `t.Run("Should ...")` subtest.** Adding inline cases to an existing function is a blocking violation. -- **Independent subtests MUST call `t.Parallel()`.** The only legitimate opt-out is a comment justifying `t.Setenv` use or shared state. Reject reviewer suggestions to add `t.Parallel()` to env-mutating tests as INVALID with rationale. -- Table-driven default; use `t.Helper()` on test helpers and `t.TempDir()` for filesystem isolation. -- **No `_ = errFn(...)` in tests.** Handle marshal/JSON/cleanup errors explicitly. -- **Status-code-only assertions are insufficient.** Also assert response body, error message, or contract-specific evidence (idempotency key, request payload). -- Mock via interfaces, not test-only methods in production code. -- `-race` flag must pass before committing. -- **Race-enabled tests must self-manage `CGO_ENABLED=1`.** Verification commands wrapping `go test -race` go through `runRaceEnabledGoCommand` (or equivalent). Don't trust ambient env. -- **Linux-Race CI parity.** Before claiming `make verify` complete on race-sensitive packages (`internal/session`, `internal/acp`, `internal/hooks`, `internal/subprocess`, `internal/resources`), reproduce locally with `act workflow_dispatch -W .github/workflows/ci.yml -j verify --container-architecture linux/amd64`. -- **`make verify` is the commit gate.** If verification is blocked by an external/branch-side asset issue (missing test fixture, etc.), do NOT commit — report the verified blocker and hold. -- **Test failures are production bugs.** Fix production code; don't weaken assertions. The only exception is documenting an INVALID review item with concrete evidence. -- **Replace fragile string-matching with structured metadata.** ACP prompt routing in `acpmock` uses typed prompt metadata, not rendered prompt substrings. -- **80% coverage minimum** per package. - -### Integration & E2E Tests - -- **Build tags**: `//go:build integration` at top of `*_integration_test.go` files. -- **Co-located** with the package they test (not in a separate `test/` directory). -- `make test` = unit only. `make test-integration` = `+integration` tag. `make test-e2e-runtime` = daemon-side E2E. `make test-e2e-web` = browser-side Playwright. -- `TestMain` for expensive one-time setup/teardown. -- Use **real dependencies** (real SQLite via `t.TempDir()`, mock ACP server as subprocess). -- Keep fast enough for CI (~30s max per package). -- **E2E tests are part of the runtime contract.** When a runtime contract changes (prompt augmenter, situation context, fixture format), the E2E mock and matchers ship in the same PR. Otherwise tests pass against a stale prompt and fail later. +- **Skill**: `agh-test-conventions` (`.agents/skills/agh-test-conventions/`). +- **When**: before writing or editing any `*_test.go` file. +- **Covers**: + - `t.Run("Should ...")` subtests, `t.Parallel` default (with `t.Setenv` opt-out), table-driven layout. + - Status-code + body assertions (status-code-only is insufficient). + - `-race` / `CGO_ENABLED=1` discipline; Linux-Race CI parity for race-sensitive packages. + - Integration / E2E build tags (`//go:build integration`, `make test-integration`, `make test-e2e-runtime`, `make test-e2e-web`). + - Runtime-contract co-ship (E2E mock + matchers ship with contract changes). + - 80% coverage floor per package. + - Commit-gate semantics (`make verify` blocks; test failures are production bugs). ### Schema Migrations -- **Schema migrations are mandatory** for any change to a SQLite column, index, or constraint. Add a numbered migration in the migrations registry. `EnsureSchema`-style boot reconciliation is forbidden for column changes. Test fresh-DB AND reopen-after-restart paths. -- **One schema migration primitive shared by all SQLite databases** (`agh.db`, `events.db`, catalog DBs). -- **SQLite recovery code paths** must rename or remove `-wal` and `-shm` companions, not only the `.db` file. -- **`ORDER BY 0` is invalid in SQLite** (positional reference). Use `(SELECT 0)` or an explicit constant column. +- **Skill**: `agh-schema-migration`. +- **When**: any SQLite column, index, or constraint change. +- **Mandatory**: numbered migration in the registry — `EnsureSchema`-style boot reconciliation is forbidden for column changes. +- **Covers**: numbered registry, transactional wrap (`BEGIN IMMEDIATE`), `-wal` / `-shm` companion handling on recovery, `ORDER BY 0` pitfall, fresh-DB + reopen-after-restart tests. -## Memory & Skills (RFC-backed) +## Vocabulary & Product Strategy -These rules come from RFC 001 (`.../agh-rfcs-local/001-agent-md-with-skills-memory.md`) and RFC 002 (`.../agh-rfcs-local/002-skills-system-final.md`): +Repo-wide rules backed by RFC 001 / RFC 002. Runtime implementation details (precedence layers, memory taxonomy, consolidation gates, lifecycle hooks) live in `internal/CLAUDE.md`. -- **Five-layer skill/memory/agent precedence**: Bundled → Marketplace → User → Additional → Workspace, with agent-local overriding all. Higher precedence wins on collision; an audit trail logs every shadow. -- **Memory taxonomy**: `user | feedback | project | reference` types; scopes `agent | workspace | global`. Default write scope declared per agent in `memory.scope`. -- **Memory consolidation gates**: Time → Sessions → Lock cascade ordered by computational cost. Default gates: 24h, 5 touched sessions, file-lock. Never replace gates with naive heuristics. -- **Lifecycle hooks** (`on_session_created`, `on_session_stopped`) execute in hierarchy precedence then alphabetical order; configurable timeout (default 5s); fail-open semantics (errors logged, never block); JSON over stdin. -- **Format extension default**: when integrating with an external spec (AgentSkills, AGENTS.md, MCP, A2A), extend via a namespaced metadata field (`metadata.agh.*` or `agh.*`) — never fork the format. - **Capability vs Recipe**: reusable agent artifacts are called `capability`, NOT `recipe`/`workflow`/`procedure`/`playbook`. Capabilities are interpretive, not deterministic; they are not workflow programs in disguise. +- **Format extension default**: when integrating with an external spec (AgentSkills, AGENTS.md, MCP, A2A), extend via a namespaced metadata field (`metadata.agh.*` or `agh.*`) — never fork the format. - **Runtime moat statement**: AGH competes on runtime, SDK, observability, DX, and integration depth — NOT the wire protocol. The AGH Network protocol must remain implementable outside AGH. Any feature requiring AGH to interoperate is a design smell. +## Memory & Lessons Learned + +`docs/_memory/` is the project's institutional memory — durable engineering knowledge distilled from real incidents, ADR forensics, and standing engineering posture. Treat it as authoritative when CLAUDE.md is silent or ambiguous. + +- **Standing directives** — `docs/_memory/standing_directives.md`. Perpetually-active engineering posture (SD-001..SD-011): long-running session supervision, greenfield-delete, BR-PT/EN, multi-LLM pipeline, real-scenario QA, forensic-first bug fixes, truthful UI, composition-root discipline, detached lifetime, extensible-and-agent-manageable design. Read before opening a TechSpec, defending an architecture pivot, or whenever someone proposes a compat shim. +- **Spec authoring playbook** — `docs/_memory/spec-authoring-playbook.md`. Mandatory preflight for `cy-create-prd` / `cy-create-techspec` / `cy-create-tasks`, with phase-by-phase MUST / MUST-NOT and evidence references. The `cy-spec-preflight` skill enforces this — always read before producing any `_idea.md` / `_prd.md` / `_techspec.md` / `_tasks.md`. +- **Lessons learned** — `docs/_memory/lessons/` (`L-001..L-013`, plus `README.md` index). One file per durable lesson with confirmed root cause + fix + evidence (ADR, commit, review issue, or QA bug). Scan the index whenever you hit a class of issue: concurrency / API, testing discipline, autonomy architecture, persistence, spec authoring. +- **Glossary** — `docs/_memory/glossary.md`. Canonical vocabulary (`capability` vs `recipe`, `AGENT.md` vs `AGENTS.md`, Peer Card vs Agent Card, autonomy primitives). Authoritative when older RFCs / ledgers conflict. Read when naming anything new, reviewing a rename PR, or when a term feels overloaded. +- **Cross-source synthesis** — `docs/_memory/_synthesis.md`. Cross-referenced findings from 8 forensic analyses, ranked by source count — the evidence corpus behind every rule in CLAUDE.md and the standing directives. Read when challenging or evolving a rule. +- **Forensic analyses** — `docs/_memory/analysis/analysis_*.md`. Per-source raw analyses (codex sessions / plans / ledger, compozy tasks, qmd collections, local / global runs, existing surfaces) feeding `_synthesis.md`. Read when synthesis cites a finding and you need the underlying evidence. + +**Authoring rules:** + +- New lesson → numbered file `L-NNN-kebab-title.md` + update `lessons/README.md`. One lesson per file. Cite specific evidence (file path, commit, review issue, ledger entry). Activate the `lesson-learned` skill. +- Don't duplicate CLAUDE.md or `standing_directives.md` rules in lessons — lessons explain **why** a rule exists; rules go in their respective files. +- Don't add speculative warnings — only confirmed incidents with evidence. +- New standing directive → next `SD-NNN` block in `standing_directives.md` with Posture / Required behavior / Source / Triggers re-evaluation when. + ## CI / Release - **No cron / schedule workflows.** Heavy/credentialed tests (`make test-e2e-nightly`, `make test-integration`) live in the `dry-run` job of the auto-created release PR. Rationale: release PR is the natural human-gated batching point. - **Looper repo (`~/dev/compozy/looper`) is the canonical source** for compozy-org Go-repo CI: composite actions (`setup-go`, `setup-bun`, `setup-git-cliff`, `setup-release`), `ci.yml`, `release.yml`, `.goreleaser.yml`, `cliff.toml`. Verbatim copies into AGH. - **Replace third-party CI actions with shell logic** when their setup fails on runners (lesson: `dorny/paths-filter@v3` runner instability replaced by inline git-based change detection). -## Forensic Bug Fixes - -- **Bug-fix plans open with confirmed reproduction** (timestamp, command, observed evidence) BEFORE listing changes. "I think" or "probably" is forbidden at the top of a fix plan. -- **Inactive metadata repair must distinguish startup-pending from crashed.** Sessions in `m.pending` are still starting, not failed. -- **Stale ACP session ids must be classified, not propagated.** Convert `Resource not found` to fresh-start fallback. - ## Cross-References -- **Spec authoring playbook** (mandatory preflight for `cy-create-prd`/`cy-create-techspec`/`cy-create-tasks`): `docs/_memory/spec-authoring-playbook.md`. -- **Standing directives** (perpetual posture): `docs/_memory/standing_directives.md`. -- **Lessons learned** (durable engineering insights with evidence): `docs/_memory/lessons/` — see `README.md` for the index. -- **Glossary** (canonical vocabulary — `capability` vs `recipe`, AGENT.md vs AGENTS.md, Peer Card vs Agent Card, autonomy primitives): `docs/_memory/glossary.md`. -- **Cross-source synthesis** (evidence trail behind every rule above): `docs/_memory/_synthesis.md` and `docs/_memory/analysis/analysis_*.md`. -- **Web rules**: `web/CLAUDE.md`. **Site rules**: `packages/site/CLAUDE.md`. -- **Active TechSpec**: `.compozy/tasks/autonomous/_techspec.md`. **ADRs**: `.compozy/tasks/autonomous/adrs/`. +- **Backend rules**: `internal/CLAUDE.md` (Go architecture, autonomy contracts, security invariants, package layout, forensic bug-fix patterns). +- **Web rules**: `web/CLAUDE.md`. +- **Site rules**: `packages/site/CLAUDE.md`. +- **Institutional memory**: `docs/_memory/` — see the **Memory & Lessons Learned** section above for the per-surface map. - **Authoritative design tokens**: `DESIGN.md` (repo root). diff --git a/CLAUDE.md b/CLAUDE.md index 014951184..b1e3adc86 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,8 +1,6 @@ -# CLAUDE.md - ## Project Overview -AGH is an Agent Operating System — a Go single-binary daemon that manages AI agent sessions via ACP (Agent Client Protocol). It spawns ACP-compatible agents (Claude Code, Codex, Gemini CLI, etc.) as subprocesses, communicates via JSON-RPC over stdio, persists events in SQLite, and exposes interfaces via HTTP/SSE (web UI) and UDS (CLI). A Fumadocs site at `agh.compozy.com` documents the runtime and the AGH Network protocol. +AGH is an Agent Operating System — a Go single-binary daemon that manages AI agent sessions via ACP (Agent Client Protocol). It spawns ACP-compatible agents (Claude Code, Codex, Gemini CLI, etc.) as subprocesses, communicates via JSON-RPC over stdio, persists events in SQLite, and exposes interfaces via HTTP/SSE (web UI) and UDS (CLI). A Fumadocs site at `agh.network` documents the runtime and the AGH Network protocol. **Goals**: daemon single-binary in background, strong observability, agent-first system (agents manipulate via CLI + REST), highly extensible, highly configurable. @@ -10,9 +8,13 @@ AGH is an Agent Operating System — a Go single-binary daemon that manages AI a ## Greenfield Alpha — Zero Legacy Tolerance -No production users exist. Never sacrifice code quality for backward compatibility. Never write migration, compat, or defensive code for old state — delete the old thing instead of working around it. - -**Hard cuts, not bridges.** Renames sweep code, storage, APIs, CLI, extensions, specs, RFCs, AND `.compozy/tasks/*` artifacts in the same change. No aliases, no dual fields, no schema fallback paths. Every breaking-change techspec MUST explicitly name its delete targets. +- **No production users exist.** +- Never sacrifice code quality for backward compatibility. +- Never write migration, compatibility, or defensive code for old state — delete obsolete code instead of working around it. +- **Hard cuts, not bridges:** + - Renames must update code, storage, APIs, CLI, extensions, specs, RFCs, and `.compozy/tasks/*` artifacts all in a single change. + - Do not create aliases, dual fields, or schema fallback paths. +- Every breaking-change techspec **MUST** explicitly list its delete targets. ## Critical Rules @@ -23,8 +25,10 @@ No production users exist. Never sacrifice code quality for backward compatibili - **Never use web search tools for local project code** — use Grep/Glob instead. Web search is only for external docs. - **Never run destructive git commands** (`git restore`, `git checkout`, `git reset`, `git clean`, `git rm`) **without explicit user permission**. If the worktree contains unexpected edits, read and work around them. - NEVER ignore errors with `_` in production code or in tests — every error must be handled or have a written justification. -- NEVER COMMITS `ai-docs/`, `.tmp/`, or `.compozy/tasks/*/memory/` TO THE REPO. They are local tracking artifacts. +- NEVER COMMITS `ai-docs/` or `.tmp/` TO THE REPO. They are local tracking artifacts. - **Subagents are read-only.** Use them for analysis, exploration, and parallel research. The author of every code change is the agent paired with the user. Subagent output is treated as evidence, not as committed work. +- **ALWAYS CHECK** the `internal/CLAUDE.md` when doing Go-related stuff +- **ALWAYS CHECK** the `web/CLAUDE.md` when doing things related to the web package ## Workflow Rules @@ -55,38 +59,43 @@ These govern how features move from idea to ship. Internalize them before openin Activate skills **before** writing code. Match task domain → activate all required skills: -| Domain | Required Skills | Conditional Skills | -| -------------------------- | ---------------------------------------------------------------------------------------- | -------------------------------------- | -| Go / Runtime | `golang-pro` | `context7` | -| Config / Logging | `golang-pro` | | -| Bug fix | `systematic-debugging` + `no-workarounds` | `testing-anti-patterns` | -| Writing Go tests | `testing-anti-patterns` + `agh-test-conventions` + `golang-pro` | `vitest` (only for test tooling docs) | -| Cleanup / failure paths | `agh-cleanup-failure-paths` + `golang-pro` | `deadlock-finder-and-fixer` | -| Schema / migration changes | `agh-schema-migration` + `golang-pro` | | -| Contract / OpenAPI changes | `agh-contract-codegen-coship` | | -| Task completion | `cy-final-verify` | | -| Architecture audit | `architectural-analysis` | `refactoring-analysis` + `ubs` | -| Concurrency / races | `deadlock-finder-and-fixer` + `golang-pro` | `systematic-debugging` | -| Performance / hot paths | `extreme-software-optimization` + `golang-pro` | | -| Security review | `security-review` | `ubs` | -| Creative / new features | `brainstorming` | `cy-idea-factory` | -| PRD creation | `cy-spec-preflight` + `cy-create-prd` | `cy-idea-factory` | -| TechSpec creation | `cy-spec-preflight` + `cy-create-techspec` + `cy-spec-peer-review` | `cy-research-competitors` | -| Task generation | `cy-spec-preflight` + `cy-create-tasks` + `cy-tasks-tail-qa-pair` + `cy-web-docs-impact` | | -| Competitor research | `cy-research-competitors` | `context7` + `find-docs` | -| Execute a PRD task | `cy-execute-task` | `cy-workflow-memory` | -| Review round / fixes | `cy-review-round` + `cy-fix-reviews` | `fix-coderabbit-review` | -| Release / scenario QA | `real-scenario-qa` (delegates to `qa-execution` + `qa-report`) | `agh-worktree-isolation` | -| Git rebase / conflicts | `git-rebase` | | -| External docs lookup | `context7` + `find-docs` | `exa-web-search-free` | -| UI / Design (any surface) | `agh-design` + `design-taste-frontend` + `minimalist-ui` | `frontend-design` + `interface-design` | +| Domain | Required Skills | Conditional Skills | +| ------------------------------------- | ---------------------------------------------------------------------------------------- | -------------------------------------- | +| Go / Runtime | `agh-code-guidelines` + `golang-pro` | `context7` | +| Config / Logging | `agh-code-guidelines` + `golang-pro` | | +| TUI / CLI Bubbletea | `bubbletea` + `agh-code-guidelines` + `golang-pro` | | +| Bug fix | `systematic-debugging` + `no-workarounds` | `testing-anti-patterns` | +| Writing Go tests | `agh-test-conventions` + `testing-anti-patterns` + `golang-pro` | `vitest` (only for test tooling docs) | +| Cleanup / failure paths | `agh-cleanup-failure-paths` + `agh-code-guidelines` + `golang-pro` | `deadlock-finder-and-fixer` | +| Schema / migration changes | `agh-schema-migration` + `golang-pro` | | +| Contract / OpenAPI changes | `agh-contract-codegen-coship` | | +| Task completion | `cy-final-verify` | | +| Lessons learned | `lesson-learned` | | +| Architecture audit | `architectural-analysis` | `refactoring-analysis` + `ubs` | +| Concurrency / races | `deadlock-finder-and-fixer` + `golang-pro` | `systematic-debugging` | +| AGH Network (`internal/network` only) | `nats` + `agh-code-guidelines` + `golang-pro` | `deadlock-finder-and-fixer` | +| Performance / hot paths | `extreme-software-optimization` + `golang-pro` | | +| Security review | `security-review` | `ubs` | +| Creative / new features | `brainstorming` | `cy-idea-factory` | +| Council debate (high-impact) | `council` | `brainstorming` | +| PRD creation | `cy-spec-preflight` + `cy-create-prd` | `cy-idea-factory` | +| TechSpec creation | `cy-spec-preflight` + `cy-create-techspec` + `cy-spec-peer-review` | `cy-research-competitors` | +| Task generation | `cy-spec-preflight` + `cy-create-tasks` + `cy-tasks-tail-qa-pair` + `cy-web-docs-impact` | | +| Competitor research | `cy-research-competitors` | `context7` + `find-docs` | +| Execute a PRD task | `cy-execute-task` | `cy-workflow-memory` | +| Review round / fixes | `cy-review-round` + `cy-fix-reviews` | `fix-coderabbit-review` | +| Release / scenario QA | `real-scenario-qa` (delegates to `qa-execution` + `qa-report`) | `agh-worktree-isolation` | +| Git rebase / conflicts | `git-rebase` | | +| External docs lookup | `context7` + `find-docs` | `exa-web-search-free` | +| Diagrams (spec / ADR) | `architecture-diagram` | `mermaid-diagrams` | +| Documentation (internal) | `documentation-writer` | `crafting-effective-readmes` | +| Skill / agent-md authoring | `skill-best-practices` + `agent-md-refactor` | | +| UI / Design (any surface) | `agh-design` + `design-taste-frontend` + `minimalist-ui` | `frontend-design` + `interface-design` | Web-specific skill dispatch is in `web/CLAUDE.md` and `web/AGENTS.md`. Site-specific dispatch is in `packages/site/CLAUDE.md`. Every domain change requires its skill — no skipping "because it's a small change". Activate multiple skills when code touches multiple domains. -`nats` skill is installed but architecturally forbidden in AGH (see Architecture Principles). Do not activate it. - ## Build Commands ### Go (backend) @@ -121,11 +130,17 @@ make cli-docs # Regenerate CLI reference from cobra JSON export Web (`web/`) commands are documented in `web/CLAUDE.md`. -## Commit style: : - -Allowed prefixes: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`, `build:`. **NO `chore:`, `style:`, or `ci:`.** Tooling and CI changes use `build:`. PR-merged commits include `(#NN)` suffix. +## Commit style -**One commit per remediation batch.** `cy-fix-reviews` rounds produce exactly one local commit per round. Run `make verify` BEFORE and AFTER the commit. Never `git commit --amend` after pre-commit hook failures — fix and create a new commit. +- ALWAYS USE: `: ` +- Allowed commit prefixes: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`, `build:` +- **Do NOT use**: `chore:`, `style:`, or `ci:`. +- Use `build:` for tooling and CI changes. +- For PR-merged commits, append a `(#NN)` suffix. +- **Create exactly one commit per remediation batch.** +- Each `cy-fix-reviews` round must produce one local commit. +- Always run `make verify` **before and after** committing. +- If a pre-commit hook fails, do **not** use `git commit --amend`. Instead, fix the issue and create a new commit. ## Code Search Hierarchy @@ -133,233 +148,83 @@ Allowed prefixes: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`, `build:`. **NO 2. **`context7` / `find-docs` skills** — for external library documentation. 3. **`exa-web-search-free`** — for web research, news, external code examples when the local docs tools are insufficient. -## Old Project Reference - -The `.old_project/` directory contains the previous AGH implementation (78K+ LOC). **Reference only** — do not modify, do not import, do not include in builds. Exclude from code search results. - -## Architecture - -### Principles - -- **Designed for incremental extension** — new capabilities arrive as new packages wired into `daemon/`, without modifying existing packages. Small interfaces + dependency injection. Every capability plan decides which extension points, hooks, capabilities, tools/resources, bundles, registries, bridge SDKs, and docs must be added, updated, or removed. -- **Pragmatic Flat with Discipline** — packages under `internal/`, API transports grouped under `api/`, no domain/infra split, no event bus. -- **`daemon/` is the sole composition root** — the only package that imports all others. Reconciliation logic running at boot belongs to composition root and is not "legacy support". -- **No package imports `daemon/`, `api/`, or `cli/`** — dependencies flow downward only. -- **Interfaces defined where consumed** (Go-style) — `session/` defines `AgentDriver`, `acp/` implements it. -- **Direct function calls through interfaces** — no event bus, no NATS, no reflection-based routing. -- **Notifier pattern for fan-out** — typed interface for observability and SSE, not a generic bus. -- **No back-pointers between packages** — inject callbacks or interfaces. -- **Functional options for constructors** — `NewManager(opts ...Option)`. -- **Maps for <10 items** — no registry interfaces for small collections. -- **File-level organization** within packages — sub-packages only when complexity justifies it. -- **CI-enforceable boundaries** — `mage Boundaries` rules prevent import cycles. Update `magefile.go` Boundaries() in the same commit that introduces a new `internal/api/*` subpackage. -- **`internal/api/core` is the canonical handler home.** REST/UDS endpoints exist as shared `BaseHandlers` methods; HTTP and UDS only choose registration and authentication. No transport-duplicated parsing/validation. -- **Authoritative primitives are exclusive.** When a primitive owns a state transition (`task.Service.ClaimNextRun`, `Spawn`, `EnsureMigration`), no peer package may replicate it. Wake/observe/sweep are allowed; claim/own is not. The mechanical scheduler does not call `ClaimNextRun`. -- **Hooks are typed dispatch, not an event bus.** Dispatch at the call site that owns the state transition. Never tail event/log tables to fire hooks. Hooks may deny/narrow/annotate but cannot bypass safety primitives (claim tokens, leases, TTL, lineage, spawn caps, permission narrowing). -- **Agent-manageable by default.** User-visible runtime capabilities must expose stable machine-readable control surfaces for agents: CLI verbs with `-o json`/`-o jsonl` where relevant, HTTP/UDS parity when state crosses the daemon boundary, discoverable status/config output, and docs that describe the agent path. UI-only manageability is incomplete. -- **No partial-surface completions.** Any change touching a public surface closes the loop end-to-end in one pass: contract → HTTP handler → UDS handler → CLI client → CLI command → extension/config/docs surfaces → tests → docs. - -### Concurrency - -- Every goroutine must have explicit ownership and shutdown via `context.Context` cancellation. -- No fire-and-forget goroutines — track with `sync.WaitGroup` or equivalent. -- Use `select` with `ctx.Done()` in all long-running goroutine loops. -- Prefer channels over shared memory with mutexes when practical. -- `sync.RWMutex` for read-heavy, `sync.Mutex` for write-heavy shared state. -- No `time.Sleep()` in orchestration — use proper synchronization primitives. -- **Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by Manager-owned WaitGroup and joined in Manager shutdown.** Never put goroutine-owned channels in a struct field that another goroutine mutates — use a per-run handle. -- **Detached execution lifetime.** Any work that outlives an HTTP/UDS request — prompts, network channel sends, automation jobs — MUST detach via `context.WithoutCancel(ctx)`. Never tie execution lifetime to request lifetime. Expose explicit cancel endpoints (e.g., `POST /api/sessions/:id/prompt/cancel`). -- **`context.WithoutCancel` does NOT preserve deadlines.** Re-attach a deadline if needed. -- **Subprocess managed-stop** must respect `ctx.Done()` between Shutdown and Wait. Wrap `proc.Wait()` in `select { case <-proc.Done(): case <-ctx.Done(): }`. -- **Process-group supervision parity.** Unix uses process groups; Windows uses forced-exit fallback. Always cross-build with `GOOS=windows GOARCH=amd64 go build` before claiming subprocess work complete. Centralize signaling helpers in `internal/procutil`. - -### Runtime - -- Single-binary and local-first. Sidecars or external control planes require a written techspec. -- Keep execution paths deterministic and observable. -- **Daemon runs in background by default.** No daemon should require a foreground terminal. -- **`compozy exec` is headless.** `--format text` returns a single string; `--format json` returns a stream of valid JSON objects; the TUI is opt-in via `--tui`. `exec` does not persist artifacts to `.compozy/runs/` unless `--persist` is given. -- **Agent operations must not depend on the web UI.** If agents need to inspect, configure, start, stop, approve, claim, release, or repair a capability, the spec must provide a CLI/HTTP/UDS path with structured output and deterministic errors. - -### Observability - -- Every domain operation emits a canonical event with correlation keys (`workspace_id`, `session_id`, `parent_session_id`, `root_session_id`, `agent_name`, `task_id`, `run_id`, `claim_token_hash`, `lease_until`, `workflow_id`, `coordinator_session_id`, `scheduler_reason`, `hook_event`, `hook_name`, `spawn_depth`, `actor_kind`, `actor_id`, `release_reason`). -- Cover with a coverage matrix test that fails if any required lifecycle path doesn't emit its canonical event. -- Append-only event store (`runtime.db`) is the canonical operational ledger; session DBs are projections, not authority. -- Live broadcasters publish only after durable append; reconnect/replay uses `after_seq`. - -## Autonomy Contracts - -These are load-bearing rules from the autonomous-mode ADRs (`.compozy/tasks/autonomous/adrs/adr-001..012`) and `_techspec.md`. Internalize them before touching the kernel. - -- **`task_runs` is the single durable work queue.** Do not introduce a parallel queue or actor table. Add new ownership/state via columns + side tables on `task_runs`. -- **`task.Service.ClaimNextRun` is the canonical claim primitive.** Lease invariants: exactly one active claim token per non-terminal run; heartbeat/complete/fail/release compare run owner + claim token; stale/late after recovery fails explicitly; sweep + heartbeat serialize via SQLite tx; boot recovery before scheduler accepts wake/claim traffic; lease extension bounded by config; one active lease per session in MVP. Use `BEGIN IMMEDIATE`; CAS predicates for sweep. -- **Capability matching = durable exact-match rows** in `task_run_required_capabilities` / `task_run_preferred_capabilities`, NOT JSON metadata. -- **Manual operator paths and autonomous paths converge on the same primitives.** User-created, automation-created, coordinator-created, and agent-spawned tasks all use the same task/run model and the same claim-token/lease/heartbeat/complete/fail/release rules. Task creation alone NEVER enqueues claimable work or starts the coordinator. Publish/start/approval is the run-enqueue boundary. -- **Coordinator auto-spawn** triggers ONLY when: workspace has no healthy active coordinator AND a coordinated run is enqueued by publish/start/approval AND run has stable `coordination_channel_id` AND auto-start enabled AND spawn caps allow. Conservative defaults (auto-start disabled, max-children 5, max-active-per-workspace 1). -- **Coordinator-agent owns semantic orchestration; mechanical scheduler owns operational safety** (idle registry, capability-aware wakeups, lease sweep, recovery, backpressure). The scheduler does NOT call `ClaimNextRun` directly in MVP. -- **Safe spawn defaults**: max-depth 1, max-children 5, mandatory TTL on every spawned session; children auto-stop with parent. Permission narrowing compares concrete atoms only (tools, skills, MCP server IDs, workspace path grants, network channels, env profile grants); subset-only; unknown child atoms count as widening and reject. Daemon NEVER silently narrows. -- **Hook taxonomy** (MVP allowlist): `coordinator.{pre_spawn,spawned,decision,stopped,failed}`, `task.run.{enqueued,pre_claim,post_claim,lease_extended,lease_expired,lease_recovered,released}`, `spawn.{pre_create,created,parent_stopped,ttl_expired,reaped}`, plus `tool.*`, `permission.*`, `session.*`. Scheduler wake/no-match/recovery stay internal metrics. No `workflow.*` umbrella in MVP. -- **Coordination channels.** Every workspace-scoped coordinated run has ONE durable `coordination_channel_id` on `task_runs`. Bind always, speak when useful — heartbeats/lease transitions never mirror as chat. Network message kinds limited to `status` / `request` / `reply` / `blocker` / `handoff` / `result` / `review_request` in MVP. Channels are NEVER an ownership/status authority. -- **Generated contracts and docs co-ship.** Any change to `internal/api/contract` co-ships in the same PR with: regen of `openapi/agh.json` and `web/src/generated/agh-openapi.d.ts`, updates to `web/src/systems/*/types.ts` consumers, Storybook/MSW fixtures, and passes `make codegen-check`, `make web-typecheck`, AND `make web-test`. -- **Agent-facing CLI is identity-inferred.** Caller identity flows from `AGH_SESSION_ID` / `AGH_AGENT` through `internal/agentidentity`. Operator endpoints MUST NOT infer agent identity from environment variables. Stable `-o json` and `-o jsonl` are compatibility contracts; no command aliases (no `done`, no `pass`). - -## Security Invariants - -- **`claim_token` redaction is non-negotiable.** Raw `claim_token` (`agh_claim_*`), MCP auth tokens, OAuth codes, PKCE verifiers, and secret bindings MUST NEVER appear in logs, status APIs, settings views, error payloads, channel messages, SSE, web UI, or memory. Use hash forms (`claim_token_hash`) over the wire. Network layer rejects raw `claim_token` in metadata. -- **Symlink escape hardening.** Skill sidecars, skill files, managed-extension dependency copies, and bundle install paths MUST verify resolved targets remain inside approved roots. Use `EvalSymlinks` + path-prefix check, not naive joins. Handle macOS `/private/var/folders` quirk (canonicalize source root before containment check). -- **Path security helpers.** Filesystem helpers resolving user-controlled or agent-controlled paths use the `sanitizePathKey` + `realpathDeepestExisting` pattern (defenses against null-byte, URL-encoded traversal, Unicode normalization, symlink-escape). -- **Identity proof-stripping defense.** In any signed-message processing path (AGH Network v1), an identity in verified format (`nickname@fingerprint`) without valid `proof` MUST classify as `rejected`, not `unverified`. -- **External-call timeouts.** Outbound HTTP/network calls MUST use a client with an explicit timeout. `http.DefaultClient` is forbidden in production code paths. -- **Load-time security scan.** Every non-bundled skill is scanned via `internal/skills.VerifyContent` on every load (not just install). Critical findings block; warning findings log; info findings log silently. Bundled skills are exempt because `go:embed` provides immutability. - -## Package Layout - -| Path | Responsibility | -| ------------------------------- | ----------------------------------------------------------------------------- | -| `cmd/agh` | Main entry point, CLI binary | -| `internal/config` | TOML loading, validation, merge, home paths, agent def parsing | -| `internal/acp` | ACP client: subprocess spawn, JSON-RPC over stdio | -| `internal/agentidentity` | Caller-identity inference from `AGH_SESSION_ID`/`AGH_AGENT` | -| `internal/automation` | Cron, webhook, and scheduled triggers; durable scheduler state | -| `internal/bridges` | External messaging adapters (Slack, Telegram, etc.) | -| `internal/bridgesdk` | Bridge SDK / contract types | -| `internal/bundles` | Bundle activation projector | -| `internal/cli` | Cobra commands | -| `internal/codegen` | OpenAPI → TS generator helpers | -| `internal/coordinator` | Coordinator-agent bootstrap and lifecycle | -| `internal/daemon` | Composition root, lock, boot, shutdown | -| `internal/diagnostics` | Diagnostics + health probes | -| `internal/e2elane` | E2E lane harness wiring | -| `internal/environment` | Env-profile resolution | -| `internal/extension` | Extension manifest, registry, host API, install runtime | -| `internal/extensiontest` | Extension test harness | -| `internal/filesnap` | File snapshot utilities | -| `internal/fileutil` | Shared filesystem helpers | -| `internal/frontmatter` | YAML frontmatter parsing | -| `internal/hooks` | Typed hook taxonomy + dispatch | -| `internal/logger` | Structured logging (slog) | -| `internal/mcp` | MCP server lifecycle / sidecars | -| `internal/memory` | Persistent dual-scope memory (global + workspace + agent), provenance, recall | -| `internal/memory/consolidation` | Dream consolidation runtime (Time → Sessions → Lock gate cascade) | -| `internal/network` | AGH Network channels/peers/wire, NATS profile | -| `internal/observe` | Event recording, health metrics, query engine | -| `internal/procutil` | Process utilities, process-group signaling, Windows fallback | -| `internal/registry` | Skill/agent/capability registry helpers | -| `internal/resources` | Resource projector / codec / validate | -| `internal/retry` | Retry primitives | -| `internal/scheduler` | Mechanical scheduler (idle registry, wakeups, sweep, recovery) | -| `internal/session` | Session lifecycle, Manager, state machine | -| `internal/settings` | Settings overlay/projection | -| `internal/situation` | Situation surface providers (`/agent/context`) | -| `internal/skills` | Skills catalog, loader, `VerifyContent`, MCP/hook decl, provenance | -| `internal/skills/bundled` | Bundled skill definitions | -| `internal/sse` | Shared SSE helpers | -| `internal/store` | SQLite shared helpers, migrations registry, validation | -| `internal/store/globaldb` | Global catalog (`agh.db`): sessions, metadata | -| `internal/store/sessiondb` | Per-session event store (`events.db`) | -| `internal/subprocess` | Subprocess signaling primitives | -| `internal/task` | Task domain, `task_runs` ownership, `ClaimNextRun` | -| `internal/testutil` | Shared test helpers | -| `internal/api/contract` | Shared daemon/CLI/HTTP contract types | -| `internal/api/core` | Shared handler types (`BaseHandlers`), error mapping, SSE helpers | -| `internal/api/httpapi` | HTTP/SSE server (Gin) for web UI | -| `internal/api/udsapi` | UDS server for CLI IPC | -| `internal/api/testutil` | Test helpers for the API layer | -| `internal/toolruntime` | Tool process registry + interrupts | -| `internal/tools` | Tool definitions and dispatch | -| `internal/transcript` | Canonical replay message assembly from persisted events | -| `internal/version` | Build metadata | -| `internal/workref` | Work reference helpers | -| `internal/workspace` | Workspace resolver and entity management | -| `web/` | React 19 SPA (Vite, TanStack Router/Query, Tailwind, shadcn) | -| `web/src/systems/` | Domain feature modules (app-renderer-systems pattern) | -| `packages/site` | Fumadocs documentation site (Bun) | -| `packages/ui` | Shared UI primitives (`@agh/ui`) | +## Surface Map + +Repo layout. Each surface owns its instructions: + +| Path | Stack | Instructions | +| --------------- | ----------------------------------------------------------------------- | ------------------------- | +| `cmd/agh` | Go binary entry point | `internal/CLAUDE.md` | +| `internal/` | Go runtime daemon (ACP, SQLite, autonomy kernel, HTTP/UDS, network) | `internal/CLAUDE.md` | +| `web/` | React 19 SPA (Vite, TanStack, Tailwind, shadcn) | `web/CLAUDE.md` | +| `packages/site` | Fumadocs documentation site (Bun) | `packages/site/CLAUDE.md` | +| `packages/ui` | Shared UI primitives (`@agh/ui`) consumed by `web/` and `packages/site` | `web/CLAUDE.md` | + +Backend architecture, autonomy contracts, security invariants, package layout, and `internal/`-specific debugging now live in **`internal/CLAUDE.md`**. Open it before touching any Go code under `cmd/` or `internal/`. ## Coding Style -- Explicit error returns with wrapped context: `fmt.Errorf("context: %w", err)`. -- Use `errors.Is()` and `errors.As()` exclusively for error matching. **`strings.Contains(err.Error(), …)` is forbidden.** -- Never ignore errors with `_` — every error must be handled or have a written justification. -- **Cleanup paths must cancel contexts and release resources.** Every error-return path that previously created or extended a `context.Context`, registered a resource, opened a connection, or spawned a subprocess MUST `cancel()`, `Close()`, `Stop()`, or release its lease on the error path. Pair `defer cancel()` immediately after `WithCancel`/`WithTimeout`. -- No `panic()` or `log.Fatal()` in production paths — only for truly unrecoverable startup failures. -- `log/slog` for structured logging — no `log.Printf` or `fmt.Println` for operational output. -- `context.Context` as first argument to functions crossing runtime boundaries — avoid `context.Background()` outside `main` and focused tests. -- **Compile-time interface verification is mandatory.** `var _ Interface = (*Type)(nil)` next to every new exported type that satisfies an interface. -- No `interface{}`/`any` when a concrete type is known. -- No reflection without performance justification. -- **Never hardcode configuration** — use TOML config or functional options. Disable/zero-value semantics must be explicit. Resolution chains documented as ordered fallbacks ending in actionable errors. -- **Config lifecycle is part of the feature lifecycle.** Any spec that adds, updates, removes, or stops needing configuration must update structs, defaults, merge/overlay behavior, validation, examples, `config.toml` docs, generated CLI/site docs, and tests in the same change. If no config change is needed, the TechSpec says why. -- **CLI flag presence detection.** Distinguish "flag not set" from "flag set to zero value" via `cmd.Flags().Changed(name)` (Cobra) or equivalent. Silently ignoring an explicit flag is a bug. -- **Whitespace normalization at CLI boundary.** String-slice CLI inputs (capabilities, IDs, tags, paths) MUST trim and drop empty entries before sending. Do not push whitespace-only strings to the daemon as "validation problems". -- **No defensive nil-checks after `make`.** Reviewers and lint flag `if x == nil` after `make(...)` as unreachable. -- **No comments restating WHAT the code does.** Comments capture WHY when non-obvious — hidden constraints, invariants, workarounds for specific bugs. Don't reference the current task or callers ("used by X", "added for Y") — those rot. +- **Skill**: `agh-code-guidelines` (`.agents/skills/agh-code-guidelines/`). +- **When**: before writing or editing any production `*.go` file under `cmd/` or `internal/`. +- **Covers**: error wrapping (`%w`), `errors.Is`/`As` only, `slog` logging, `context.Context` discipline, compile-time interface assertions, no hardcoded config, CLI flag presence detection, comments policy, generic concurrency patterns. +- **Top-level invariants restated in Critical Rules**: no `_`-discarded errors, `make verify` must pass, `make lint` zero tolerance. ## Testing -- **Every Go test case MUST be inside a `t.Run("Should ...")` subtest.** Adding inline cases to an existing function is a blocking violation. -- **Independent subtests MUST call `t.Parallel()`.** The only legitimate opt-out is a comment justifying `t.Setenv` use or shared state. Reject reviewer suggestions to add `t.Parallel()` to env-mutating tests as INVALID with rationale. -- Table-driven default; use `t.Helper()` on test helpers and `t.TempDir()` for filesystem isolation. -- **No `_ = errFn(...)` in tests.** Handle marshal/JSON/cleanup errors explicitly. -- **Status-code-only assertions are insufficient.** Also assert response body, error message, or contract-specific evidence (idempotency key, request payload). -- Mock via interfaces, not test-only methods in production code. -- `-race` flag must pass before committing. -- **Race-enabled tests must self-manage `CGO_ENABLED=1`.** Verification commands wrapping `go test -race` go through `runRaceEnabledGoCommand` (or equivalent). Don't trust ambient env. -- **Linux-Race CI parity.** Before claiming `make verify` complete on race-sensitive packages (`internal/session`, `internal/acp`, `internal/hooks`, `internal/subprocess`, `internal/resources`), reproduce locally with `act workflow_dispatch -W .github/workflows/ci.yml -j verify --container-architecture linux/amd64`. -- **`make verify` is the commit gate.** If verification is blocked by an external/branch-side asset issue (missing test fixture, etc.), do NOT commit — report the verified blocker and hold. -- **Test failures are production bugs.** Fix production code; don't weaken assertions. The only exception is documenting an INVALID review item with concrete evidence. -- **Replace fragile string-matching with structured metadata.** ACP prompt routing in `acpmock` uses typed prompt metadata, not rendered prompt substrings. -- **80% coverage minimum** per package. - -### Integration & E2E Tests - -- **Build tags**: `//go:build integration` at top of `*_integration_test.go` files. -- **Co-located** with the package they test (not in a separate `test/` directory). -- `make test` = unit only. `make test-integration` = `+integration` tag. `make test-e2e-runtime` = daemon-side E2E. `make test-e2e-web` = browser-side Playwright. -- `TestMain` for expensive one-time setup/teardown. -- Use **real dependencies** (real SQLite via `t.TempDir()`, mock ACP server as subprocess). -- Keep fast enough for CI (~30s max per package). -- **E2E tests are part of the runtime contract.** When a runtime contract changes (prompt augmenter, situation context, fixture format), the E2E mock and matchers ship in the same PR. Otherwise tests pass against a stale prompt and fail later. +- **Skill**: `agh-test-conventions` (`.agents/skills/agh-test-conventions/`). +- **When**: before writing or editing any `*_test.go` file. +- **Covers**: + - `t.Run("Should ...")` subtests, `t.Parallel` default (with `t.Setenv` opt-out), table-driven layout. + - Status-code + body assertions (status-code-only is insufficient). + - `-race` / `CGO_ENABLED=1` discipline; Linux-Race CI parity for race-sensitive packages. + - Integration / E2E build tags (`//go:build integration`, `make test-integration`, `make test-e2e-runtime`, `make test-e2e-web`). + - Runtime-contract co-ship (E2E mock + matchers ship with contract changes). + - 80% coverage floor per package. + - Commit-gate semantics (`make verify` blocks; test failures are production bugs). ### Schema Migrations -- **Schema migrations are mandatory** for any change to a SQLite column, index, or constraint. Add a numbered migration in the migrations registry. `EnsureSchema`-style boot reconciliation is forbidden for column changes. Test fresh-DB AND reopen-after-restart paths. -- **One schema migration primitive shared by all SQLite databases** (`agh.db`, `events.db`, catalog DBs). -- **SQLite recovery code paths** must rename or remove `-wal` and `-shm` companions, not only the `.db` file. -- **`ORDER BY 0` is invalid in SQLite** (positional reference). Use `(SELECT 0)` or an explicit constant column. +- **Skill**: `agh-schema-migration`. +- **When**: any SQLite column, index, or constraint change. +- **Mandatory**: numbered migration in the registry — `EnsureSchema`-style boot reconciliation is forbidden for column changes. +- **Covers**: numbered registry, transactional wrap (`BEGIN IMMEDIATE`), `-wal` / `-shm` companion handling on recovery, `ORDER BY 0` pitfall, fresh-DB + reopen-after-restart tests. -## Memory & Skills (RFC-backed) +## Vocabulary & Product Strategy -These rules come from RFC 001 (`.../agh-rfcs-local/001-agent-md-with-skills-memory.md`) and RFC 002 (`.../agh-rfcs-local/002-skills-system-final.md`): +Repo-wide rules backed by RFC 001 / RFC 002. Runtime implementation details (precedence layers, memory taxonomy, consolidation gates, lifecycle hooks) live in `internal/CLAUDE.md`. -- **Five-layer skill/memory/agent precedence**: Bundled → Marketplace → User → Additional → Workspace, with agent-local overriding all. Higher precedence wins on collision; an audit trail logs every shadow. -- **Memory taxonomy**: `user | feedback | project | reference` types; scopes `agent | workspace | global`. Default write scope declared per agent in `memory.scope`. -- **Memory consolidation gates**: Time → Sessions → Lock cascade ordered by computational cost. Default gates: 24h, 5 touched sessions, file-lock. Never replace gates with naive heuristics. -- **Lifecycle hooks** (`on_session_created`, `on_session_stopped`) execute in hierarchy precedence then alphabetical order; configurable timeout (default 5s); fail-open semantics (errors logged, never block); JSON over stdin. -- **Format extension default**: when integrating with an external spec (AgentSkills, AGENTS.md, MCP, A2A), extend via a namespaced metadata field (`metadata.agh.*` or `agh.*`) — never fork the format. - **Capability vs Recipe**: reusable agent artifacts are called `capability`, NOT `recipe`/`workflow`/`procedure`/`playbook`. Capabilities are interpretive, not deterministic; they are not workflow programs in disguise. +- **Format extension default**: when integrating with an external spec (AgentSkills, AGENTS.md, MCP, A2A), extend via a namespaced metadata field (`metadata.agh.*` or `agh.*`) — never fork the format. - **Runtime moat statement**: AGH competes on runtime, SDK, observability, DX, and integration depth — NOT the wire protocol. The AGH Network protocol must remain implementable outside AGH. Any feature requiring AGH to interoperate is a design smell. +## Memory & Lessons Learned + +`docs/_memory/` is the project's institutional memory — durable engineering knowledge distilled from real incidents, ADR forensics, and standing engineering posture. Treat it as authoritative when CLAUDE.md is silent or ambiguous. + +- **Standing directives** — `docs/_memory/standing_directives.md`. Perpetually-active engineering posture (SD-001..SD-011): long-running session supervision, greenfield-delete, BR-PT/EN, multi-LLM pipeline, real-scenario QA, forensic-first bug fixes, truthful UI, composition-root discipline, detached lifetime, extensible-and-agent-manageable design. Read before opening a TechSpec, defending an architecture pivot, or whenever someone proposes a compat shim. +- **Spec authoring playbook** — `docs/_memory/spec-authoring-playbook.md`. Mandatory preflight for `cy-create-prd` / `cy-create-techspec` / `cy-create-tasks`, with phase-by-phase MUST / MUST-NOT and evidence references. The `cy-spec-preflight` skill enforces this — always read before producing any `_idea.md` / `_prd.md` / `_techspec.md` / `_tasks.md`. +- **Lessons learned** — `docs/_memory/lessons/` (`L-001..L-013`, plus `README.md` index). One file per durable lesson with confirmed root cause + fix + evidence (ADR, commit, review issue, or QA bug). Scan the index whenever you hit a class of issue: concurrency / API, testing discipline, autonomy architecture, persistence, spec authoring. +- **Glossary** — `docs/_memory/glossary.md`. Canonical vocabulary (`capability` vs `recipe`, `AGENT.md` vs `AGENTS.md`, Peer Card vs Agent Card, autonomy primitives). Authoritative when older RFCs / ledgers conflict. Read when naming anything new, reviewing a rename PR, or when a term feels overloaded. +- **Cross-source synthesis** — `docs/_memory/_synthesis.md`. Cross-referenced findings from 8 forensic analyses, ranked by source count — the evidence corpus behind every rule in CLAUDE.md and the standing directives. Read when challenging or evolving a rule. +- **Forensic analyses** — `docs/_memory/analysis/analysis_*.md`. Per-source raw analyses (codex sessions / plans / ledger, compozy tasks, qmd collections, local / global runs, existing surfaces) feeding `_synthesis.md`. Read when synthesis cites a finding and you need the underlying evidence. + +**Authoring rules:** + +- New lesson → numbered file `L-NNN-kebab-title.md` + update `lessons/README.md`. One lesson per file. Cite specific evidence (file path, commit, review issue, ledger entry). Activate the `lesson-learned` skill. +- Don't duplicate CLAUDE.md or `standing_directives.md` rules in lessons — lessons explain **why** a rule exists; rules go in their respective files. +- Don't add speculative warnings — only confirmed incidents with evidence. +- New standing directive → next `SD-NNN` block in `standing_directives.md` with Posture / Required behavior / Source / Triggers re-evaluation when. + ## CI / Release - **No cron / schedule workflows.** Heavy/credentialed tests (`make test-e2e-nightly`, `make test-integration`) live in the `dry-run` job of the auto-created release PR. Rationale: release PR is the natural human-gated batching point. - **Looper repo (`~/dev/compozy/looper`) is the canonical source** for compozy-org Go-repo CI: composite actions (`setup-go`, `setup-bun`, `setup-git-cliff`, `setup-release`), `ci.yml`, `release.yml`, `.goreleaser.yml`, `cliff.toml`. Verbatim copies into AGH. - **Replace third-party CI actions with shell logic** when their setup fails on runners (lesson: `dorny/paths-filter@v3` runner instability replaced by inline git-based change detection). -## Forensic Bug Fixes - -- **Bug-fix plans open with confirmed reproduction** (timestamp, command, observed evidence) BEFORE listing changes. "I think" or "probably" is forbidden at the top of a fix plan. -- **Inactive metadata repair must distinguish startup-pending from crashed.** Sessions in `m.pending` are still starting, not failed. -- **Stale ACP session ids must be classified, not propagated.** Convert `Resource not found` to fresh-start fallback. - ## Cross-References -- **Spec authoring playbook** (mandatory preflight for `cy-create-prd`/`cy-create-techspec`/`cy-create-tasks`): `docs/_memory/spec-authoring-playbook.md`. -- **Standing directives** (perpetual posture): `docs/_memory/standing_directives.md`. -- **Lessons learned** (durable engineering insights with evidence): `docs/_memory/lessons/` — see `README.md` for the index. -- **Glossary** (canonical vocabulary — `capability` vs `recipe`, AGENT.md vs AGENTS.md, Peer Card vs Agent Card, autonomy primitives): `docs/_memory/glossary.md`. -- **Cross-source synthesis** (evidence trail behind every rule above): `docs/_memory/_synthesis.md` and `docs/_memory/analysis/analysis_*.md`. -- **Web rules**: `web/CLAUDE.md`. **Site rules**: `packages/site/CLAUDE.md`. -- **Active TechSpec**: `.compozy/tasks/autonomous/_techspec.md`. **ADRs**: `.compozy/tasks/autonomous/adrs/`. +- **Backend rules**: `internal/CLAUDE.md` (Go architecture, autonomy contracts, security invariants, package layout, forensic bug-fix patterns). +- **Web rules**: `web/CLAUDE.md`. +- **Site rules**: `packages/site/CLAUDE.md`. +- **Institutional memory**: `docs/_memory/` — see the **Memory & Lessons Learned** section above for the per-surface map. - **Authoritative design tokens**: `DESIGN.md` (repo root). diff --git a/DESIGN.md b/DESIGN.md index f010e1a28..8814e66a3 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -6,7 +6,7 @@ AGH ships as three surfaces that must feel like one product: 1. **AGH Runtime** — the local daemon, operator UI (`web/`) and CLI. Sessions, memory, skills, workspaces, automation, bridges, observability. 2. **AGH Network** — `agh-network/v0`, the seven-kind wire protocol (`greet`, `whois`, `say`, `direct`, `recipe`, `receipt`, `trace`) over NATS + JSON. -3. **packages/site** — the marketing landing + Fumadocs MDX docs at `agh.compozy.com` with two trees (`/runtime/*`, `/protocol/*`). +3. **packages/site** — the marketing landing + Fumadocs MDX docs at `agh.network` with two trees (`/runtime/*`, `/protocol/*`). The canonical token source is [`packages/ui/src/tokens.css`](packages/ui/src/tokens.css). The canonical reference extraction and UI kits live in [`docs/design/design-system/`](docs/design/design-system/). diff --git a/docs/_memory/analysis/analysis_existing_surfaces.md b/docs/_memory/analysis/analysis_existing_surfaces.md index 3c7f4f708..f84736876 100644 --- a/docs/_memory/analysis/analysis_existing_surfaces.md +++ b/docs/_memory/analysis/analysis_existing_surfaces.md @@ -51,7 +51,7 @@ User-global `/Users/pedronauck/.claude/skills/` is a mix of real directories and `/Users/pedronauck/.claude/projects/-Users-pedronauck-Dev-compozy-agh/memory/MEMORY.md` indexes three project notes: -1. **`project_site_docs.md`** (2026-04-15) — Approved Site & Docs techspec. Hero "Your agents can finally talk to each other." (network-first). Fumadocs at `agh.compozy.com` on Vercel, lives in `site/` in monorepo (NOTE: ADR-011 + tasks 15-16 reference `packages/site/` — memory may be slightly drifting). Two doc products (Runtime Docs + Network Protocol Spec); CLI ref via cobra JSON export, API via OpenAPI + fumadocs-openapi; ~63 pages in 3 waves. "AGH Network Protocol is the key differentiator." Research at `.compozy/tasks/site/analysis/`, techspec at `.compozy/tasks/site/_techspec.md`. +1. **`project_site_docs.md`** (2026-04-15) — Approved Site & Docs techspec. Hero "Your agents can finally talk to each other." (network-first). Fumadocs at `agh.network` on Vercel, lives in `site/` in monorepo (NOTE: ADR-011 + tasks 15-16 reference `packages/site/` — memory may be slightly drifting). Two doc products (Runtime Docs + Network Protocol Spec); CLI ref via cobra JSON export, API via OpenAPI + fumadocs-openapi; ~63 pages in 3 waves. "AGH Network Protocol is the key differentiator." Research at `.compozy/tasks/site/analysis/`, techspec at `.compozy/tasks/site/_techspec.md`. 2. **`feedback_ci_no_cron.md`** (2026-04-17) — CI philosophy. NO GitHub Actions cron/schedule workflows. Heavy/credentialed tests (`make test-e2e-nightly`, `make test-integration`) live in the `dry-run` job of the auto-created release PR. Rationale: release PR is the natural human-gated batching point. PR workflow `ci.yml` = verify + e2e-combined only; release workflow `release.yml` has 3 jobs (release-pr / dry-run / release). diff --git a/docs/_memory/analysis/analysis_qmd_collections.md b/docs/_memory/analysis/analysis_qmd_collections.md index 2656f6b4b..6909ca007 100644 --- a/docs/_memory/analysis/analysis_qmd_collections.md +++ b/docs/_memory/analysis/analysis_qmd_collections.md @@ -284,7 +284,7 @@ Things the RFCs leave unresolved that are now visible across the corpus: 4. **Marketplace governance is unresolved.** Post-ClawHavoc, RFC 002 §6 leaves "manual review vs automated scanning vs combination" open. The current code has the load-time scanner but no marketplace UX, no consent revocation, no expiry. 5. **Agent identity portability vs project memory namespacing.** RFC 001 §6.3 acknowledges that copying an agent directory to multiple projects diverges memories naturally — but this is not the same as agent identity _across_ AGH instances on the network (RFC 003 `nickname@fingerprint`). The two identity models are not unified. 6. **No JetStream profile yet.** RFC 003 v0 §11.10 explicitly defers JetStream durability, dead-letter, ACL, account/tenancy. These are real operational needs for any production AGH-Network deployment. The RFC-as-MVP shape is intentional but the next-profile sequencing is undefined. -7. **The `agh-compozy/`, `agh-docs/`, `agh-site-*` collections are empty.** Public docs/site/ledger/plans collections all index 0 files. The Fumadocs site project (per Pedro's auto-memory: "Approved techspec: Fumadocs site at agh.compozy.com") has no captured artifacts yet — there's no public-facing record corresponding to the depth of internal RFC work. +7. **The `agh-compozy/`, `agh-docs/`, `agh-site-*` collections are empty.** Public docs/site/ledger/plans collections all index 0 files. The Fumadocs site project (per Pedro's auto-memory: "Approved techspec: Fumadocs site at agh.network") has no captured artifacts yet — there's no public-facing record corresponding to the depth of internal RFC work. 8. **No A2A Agent Card mapping defined.** RFC 001 §3.3 ("vs A2A Agent Cards") notes that "Agent Cards could be _generated from_ an AGENT.md definition" but there's no defined mapping. If AGH is going to publish AGENT.md as a portable format and also speak AGH-Network (which exposes Peer Cards), the relationship between AGENT.md / Peer Card / A2A Agent Card needs to be pinned. 9. **`recipe` artifact still appears in the AGH-Network non-normative section of the old draft as "first-class," but it has no SKILL.md/AGENT.md analogue in the v0/v1 RFCs.** Is a capability the same thing as a skill exposed over the network? This is hinted at but not stated. 10. **Skill auto-proposal accuracy.** RFC 002 §2.6 says "3+ occurrences across different sessions" is the threshold but acknowledges in §6.4 that false positives erode trust. There is no calibration data, no test plan, no opt-out UX. This is the most ambitious item in increment 2 and the least specified. diff --git a/docs/design/design-system/ui_kits/marketing/Sections.jsx b/docs/design/design-system/ui_kits/marketing/Sections.jsx index db11ad725..7367a2174 100644 --- a/docs/design/design-system/ui_kits/marketing/Sections.jsx +++ b/docs/design/design-system/ui_kits/marketing/Sections.jsx @@ -523,7 +523,7 @@ function InstallSection() { { id: "binary", label: "Binary", - command: "curl -fsSL https://get.agh.compozy.com | sh", + command: "curl -fsSL https://get.agh.network | sh", note: "Linux + macOS · prebuilt", }, ]; diff --git a/docs/rfcs/001_agent-md-with-skills-memory.md b/docs/rfcs/001_agent-md-with-skills-memory.md index 8dfccdd38..e81a21ef5 100644 --- a/docs/rfcs/001_agent-md-with-skills-memory.md +++ b/docs/rfcs/001_agent-md-with-skills-memory.md @@ -123,7 +123,7 @@ Each agent is a self-contained directory: debug_patterns.md ``` -In AGH, self-contained agent directories may also include optional runtime sidecars that travel with the agent, such as `mcp.json` and a capability catalog (`capabilities.toml`, `capabilities.json`, or `capabilities/`). The runtime authoring rules for capability catalogs live in [docs/agents/capabilities.md](../agents/capabilities.md). +In AGH, self-contained agent directories may also include optional runtime sidecars that travel with the agent, such as `mcp.json` and a capability catalog (`capabilities.toml`, `capabilities.json`, or `capabilities/`). The runtime authoring rules for capability catalogs live in [RFC 005](005_capability-catalogs-agent-directories.md). Agent directories can live in: diff --git a/docs/rfcs/003_agh-network-v0.md b/docs/rfcs/003_agh-network-v0.md index affeab920..886dcfc20 100644 --- a/docs/rfcs/003_agh-network-v0.md +++ b/docs/rfcs/003_agh-network-v0.md @@ -511,7 +511,7 @@ Recommended shape: - summaries SHOULD remain short enough for periodic `greet` traffic - summaries SHOULD be a single short sentence and SHOULD target `<= 160` UTF-8 characters in v0 -Local capability authoring, file layout, and validation are runtime concerns and are documented separately in `docs/agents/capabilities.md`. +Local capability authoring, file layout, and validation are runtime concerns and are documented separately in `docs/rfcs/005_capability-catalogs-agent-directories.md`. ### 8.3 `whois` @@ -624,7 +624,7 @@ Rules: - if a full rich catalog would exceed the envelope size limit, responders SHOULD omit `agh.capability_catalog` unless a narrower filtered request is provided - receivers MUST ignore unknown AGH extension keys per the core `ext` rules -The local capability catalog model and validation rules are runtime concerns and are documented separately in `docs/agents/capabilities.md`. +The local capability catalog model and validation rules are runtime concerns and are documented separately in `docs/rfcs/005_capability-catalogs-agent-directories.md`. ### 8.4 `say` diff --git a/docs/agents/capabilities.md b/docs/rfcs/005_capability-catalogs-agent-directories.md similarity index 98% rename from docs/agents/capabilities.md rename to docs/rfcs/005_capability-catalogs-agent-directories.md index 5e19b8672..64f595dfb 100644 --- a/docs/agents/capabilities.md +++ b/docs/rfcs/005_capability-catalogs-agent-directories.md @@ -2,7 +2,7 @@ AGH uses one capability model end to end. A capability is authored locally in the agent directory, normalized by the runtime, advertised briefly in `greet`, returned richly through explicit `whois`, and transferred explicitly through `kind:"capability"` when a peer shares a portable capability document. -This guide covers the runtime authoring and projection surface. For the self-contained agent-directory story, see [RFC 001](../rfcs/001_agent-md-with-skills-memory.md). For the wire contract, see [RFC 003](../rfcs/003_agh-network-v0.md). +This guide covers the runtime authoring and projection surface. For the self-contained agent-directory story, see [RFC 001](001_agent-md-with-skills-memory.md). For the wire contract, see [RFC 003](003_agh-network-v0.md). ## What Belongs Here diff --git a/internal/AGENTS.md b/internal/AGENTS.md new file mode 100644 index 000000000..63bbef9d0 --- /dev/null +++ b/internal/AGENTS.md @@ -0,0 +1,134 @@ +# Internal Backend (Go) + +The Go runtime — `internal/*` packages composed by `internal/daemon`, plus the API transports under `internal/api/*`. ACP subprocess management, SQLite persistence, HTTP/SSE + UDS APIs, autonomy kernel, AGH Network. Entry binary lives in `cmd/agh`. + +Repo-wide rules (Critical Rules, Workflow, Build, Commits, Skill Dispatch, Memory & Skills RFC, CI/Release) live in the **root `CLAUDE.md`**. This file owns architecture, package boundaries, autonomy contracts, security invariants, and `internal/`-specific debugging/forensics. + +## Architecture + +### Principles + +- **Designed for incremental extension** — new capabilities arrive as new packages wired into `daemon/`, without modifying existing packages. Small interfaces + dependency injection. Every capability plan decides which extension points, hooks, capabilities, tools/resources, bundles, registries, bridge SDKs, and docs must be added, updated, or removed. +- **Pragmatic Flat with Discipline** — packages under `internal/`, API transports grouped under `api/`, no domain/infra split, no event bus. +- **`daemon/` is the sole composition root** — the only package that imports all others. Reconciliation logic running at boot belongs to composition root and is not "legacy support". +- **No package imports `daemon/`, `api/`, or `cli/`** — dependencies flow downward only. +- **Interfaces defined where consumed** (Go-style) — `session/` defines `AgentDriver`, `acp/` implements it. +- **Direct function calls through interfaces** — no event bus, no reflection-based routing, no NATS as inter-package coordination. NATS is permitted **only** inside `internal/network` as the embedded wire transport for the AGH Network protocol; daemon packages communicate via interfaces and the Notifier pattern, never by publishing to subjects. +- **Notifier pattern for fan-out** — typed interface for observability and SSE, not a generic bus. +- **No back-pointers between packages** — inject callbacks or interfaces. +- **Functional options for constructors** — `NewManager(opts ...Option)`. +- **Maps for <10 items** — no registry interfaces for small collections. +- **File-level organization** within packages — sub-packages only when complexity justifies it. +- **CI-enforceable boundaries** — `mage Boundaries` rules prevent import cycles. Update `magefile.go` Boundaries() in the same commit that introduces a new `internal/api/*` subpackage. +- **`internal/api/core` is the canonical handler home.** REST/UDS endpoints exist as shared `BaseHandlers` methods; HTTP and UDS only choose registration and authentication. No transport-duplicated parsing/validation. +- **Authoritative primitives are exclusive.** When a primitive owns a state transition (`task.Service.ClaimNextRun`, `Spawn`, `EnsureMigration`), no peer package may replicate it. Wake/observe/sweep are allowed; claim/own is not. The mechanical scheduler does not call `ClaimNextRun`. +- **Hooks are typed dispatch, not an event bus.** Dispatch at the call site that owns the state transition. Never tail event/log tables to fire hooks. Hooks may deny/narrow/annotate but cannot bypass safety primitives (claim tokens, leases, TTL, lineage, spawn caps, permission narrowing). +- **Agent-manageable by default.** User-visible runtime capabilities must expose stable machine-readable control surfaces for agents: CLI verbs with `-o json`/`-o jsonl` where relevant, HTTP/UDS parity when state crosses the daemon boundary, discoverable status/config output, and docs that describe the agent path. UI-only manageability is incomplete. +- **No partial-surface completions.** Any change touching a public surface closes the loop end-to-end in one pass: contract → HTTP handler → UDS handler → CLI client → CLI command → extension/config/docs surfaces → tests → docs. + +### Concurrency + +Generic Go concurrency patterns (goroutine ownership, channels vs mutexes, `select`/`ctx.Done()` discipline, no `time.Sleep` in orchestration) live in `agh-code-guidelines`. Architectural invariants below are load-bearing for design decisions: + +- **Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by Manager-owned WaitGroup and joined in Manager shutdown.** Never put goroutine-owned channels in a struct field that another goroutine mutates — use a per-run handle. +- **Detached execution lifetime.** Any work that outlives an HTTP/UDS request — prompts, network channel sends, automation jobs — MUST detach via `context.WithoutCancel(ctx)`. Never tie execution lifetime to request lifetime. Expose explicit cancel endpoints (e.g., `POST /api/sessions/:id/prompt/cancel`). +- **`context.WithoutCancel` does NOT preserve deadlines.** Re-attach a deadline if needed. +- **Subprocess managed-stop** must respect `ctx.Done()` between Shutdown and Wait. Wrap `proc.Wait()` in `select { case <-proc.Done(): case <-ctx.Done(): }`. +- **Process-group supervision parity.** Unix uses process groups; Windows uses forced-exit fallback. Always cross-build with `GOOS=windows GOARCH=amd64 go build` before claiming subprocess work complete. Centralize signaling helpers in `internal/procutil`. + +### Runtime + +- Single-binary and local-first. Sidecars or external control planes require a written techspec. +- Keep execution paths deterministic and observable. +- **Daemon runs in background by default.** No daemon should require a foreground terminal. +- **`compozy exec` is headless.** `--format text` returns a single string; `--format json` returns a stream of valid JSON objects; the TUI is opt-in via `--tui`. `exec` does not persist artifacts to `.compozy/runs/` unless `--persist` is given. +- **Agent operations must not depend on the web UI.** If agents need to inspect, configure, start, stop, approve, claim, release, or repair a capability, the spec must provide a CLI/HTTP/UDS path with structured output and deterministic errors. + +### Observability + +- Every domain operation emits a canonical event with correlation keys (`workspace_id`, `session_id`, `parent_session_id`, `root_session_id`, `agent_name`, `task_id`, `run_id`, `claim_token_hash`, `lease_until`, `workflow_id`, `coordinator_session_id`, `scheduler_reason`, `hook_event`, `hook_name`, `spawn_depth`, `actor_kind`, `actor_id`, `release_reason`). +- Cover with a coverage matrix test that fails if any required lifecycle path doesn't emit its canonical event. +- Append-only event store (`runtime.db`) is the canonical operational ledger; session DBs are projections, not authority. +- Live broadcasters publish only after durable append; reconnect/replay uses `after_seq`. + +## Security Invariants + +- **`claim_token` redaction is non-negotiable.** Raw `claim_token` (`agh_claim_*`), MCP auth tokens, OAuth codes, PKCE verifiers, and secret bindings MUST NEVER appear in logs, status APIs, settings views, error payloads, channel messages, SSE, web UI, or memory. Use hash forms (`claim_token_hash`) over the wire. Network layer rejects raw `claim_token` in metadata. +- **Symlink escape hardening.** Skill sidecars, skill files, managed-extension dependency copies, and bundle install paths MUST verify resolved targets remain inside approved roots. Use `EvalSymlinks` + path-prefix check, not naive joins. Handle macOS `/private/var/folders` quirk (canonicalize source root before containment check). +- **Path security helpers.** Filesystem helpers resolving user-controlled or agent-controlled paths use the `sanitizePathKey` + `realpathDeepestExisting` pattern (defenses against null-byte, URL-encoded traversal, Unicode normalization, symlink-escape). +- **Identity proof-stripping defense.** In any signed-message processing path (AGH Network v1), an identity in verified format (`nickname@fingerprint`) without valid `proof` MUST classify as `rejected`, not `unverified`. +- **External-call timeouts.** Outbound HTTP/network calls MUST use a client with an explicit timeout. `http.DefaultClient` is forbidden in production code paths. +- **Load-time security scan.** Every non-bundled skill is scanned via `internal/skills.VerifyContent` on every load (not just install). Critical findings block; warning findings log; info findings log silently. Bundled skills are exempt because `go:embed` provides immutability. + +## Package Layout + +| Path | Responsibility | +| ------------------------------- | ----------------------------------------------------------------------------- | +| `cmd/agh` | Main entry point, CLI binary | +| `internal/config` | TOML loading, validation, merge, home paths, agent def parsing | +| `internal/acp` | ACP client: subprocess spawn, JSON-RPC over stdio | +| `internal/agentidentity` | Caller-identity inference from `AGH_SESSION_ID`/`AGH_AGENT` | +| `internal/automation` | Cron, webhook, and scheduled triggers; durable scheduler state | +| `internal/bridges` | External messaging adapters (Slack, Telegram, etc.) | +| `internal/bridgesdk` | Bridge SDK / contract types | +| `internal/bundles` | Bundle activation projector | +| `internal/cli` | Cobra commands | +| `internal/codegen` | OpenAPI → TS generator helpers | +| `internal/coordinator` | Coordinator-agent bootstrap and lifecycle | +| `internal/daemon` | Composition root, lock, boot, shutdown | +| `internal/diagnostics` | Diagnostics + health probes | +| `internal/e2elane` | E2E lane harness wiring | +| `internal/environment` | Env-profile resolution | +| `internal/extension` | Extension manifest, registry, host API, install runtime | +| `internal/extensiontest` | Extension test harness | +| `internal/filesnap` | File snapshot utilities | +| `internal/fileutil` | Shared filesystem helpers | +| `internal/frontmatter` | YAML frontmatter parsing | +| `internal/hooks` | Typed hook taxonomy + dispatch | +| `internal/logger` | Structured logging (slog) | +| `internal/mcp` | MCP server lifecycle / sidecars | +| `internal/memory` | Persistent dual-scope memory (global + workspace + agent), provenance, recall | +| `internal/memory/consolidation` | Dream consolidation runtime (Time → Sessions → Lock gate cascade) | +| `internal/network` | AGH Network channels/peers/wire, NATS profile | +| `internal/observe` | Event recording, health metrics, query engine | +| `internal/procutil` | Process utilities, process-group signaling, Windows fallback | +| `internal/registry` | Skill/agent/capability registry helpers | +| `internal/resources` | Resource projector / codec / validate | +| `internal/retry` | Retry primitives | +| `internal/scheduler` | Mechanical scheduler (idle registry, wakeups, sweep, recovery) | +| `internal/session` | Session lifecycle, Manager, state machine | +| `internal/settings` | Settings overlay/projection | +| `internal/situation` | Situation surface providers (`/agent/context`) | +| `internal/skills` | Skills catalog, loader, `VerifyContent`, MCP/hook decl, provenance | +| `internal/skills/bundled` | Bundled skill definitions | +| `internal/sse` | Shared SSE helpers | +| `internal/store` | SQLite shared helpers, migrations registry, validation | +| `internal/store/globaldb` | Global catalog (`agh.db`): sessions, metadata | +| `internal/store/sessiondb` | Per-session event store (`events.db`) | +| `internal/subprocess` | Subprocess signaling primitives | +| `internal/task` | Task domain, `task_runs` ownership, `ClaimNextRun` | +| `internal/testutil` | Shared test helpers | +| `internal/api/contract` | Shared daemon/CLI/HTTP contract types | +| `internal/api/core` | Shared handler types (`BaseHandlers`), error mapping, SSE helpers | +| `internal/api/httpapi` | HTTP/SSE server (Gin) for web UI | +| `internal/api/udsapi` | UDS server for CLI IPC | +| `internal/api/testutil` | Test helpers for the API layer | +| `internal/toolruntime` | Tool process registry + interrupts | +| `internal/tools` | Tool definitions and dispatch | +| `internal/transcript` | Canonical replay message assembly from persisted events | +| `internal/version` | Build metadata | +| `internal/workref` | Work reference helpers | +| `internal/workspace` | Workspace resolver and entity management | + +## Memory & Skills Runtime (RFC-backed) + +- **Five-layer skill/memory/agent precedence**: Bundled → Marketplace → User → Additional → Workspace, with agent-local overriding all. Higher precedence wins on collision; an audit trail logs every shadow. +- **Memory taxonomy**: `user | feedback | project | reference` types; scopes `agent | workspace | global`. Default write scope declared per agent in `memory.scope`. +- **Memory consolidation gates**: Time → Sessions → Lock cascade ordered by computational cost. Default gates: 24h, 5 touched sessions, file-lock. Never replace gates with naive heuristics. +- **Lifecycle hooks** (`on_session_created`, `on_session_stopped`) execute in hierarchy precedence then alphabetical order; configurable timeout (default 5s); fail-open semantics (errors logged, never block); JSON over stdin. + +## Forensic Bug Fixes + +- **Bug-fix plans open with confirmed reproduction** (timestamp, command, observed evidence) BEFORE listing changes. "I think" or "probably" is forbidden at the top of a fix plan. +- **Inactive metadata repair must distinguish startup-pending from crashed.** Sessions in `m.pending` are still starting, not failed. +- **Stale ACP session ids must be classified, not propagated.** Convert `Resource not found` to fresh-start fallback. diff --git a/internal/CLAUDE.md b/internal/CLAUDE.md new file mode 100644 index 000000000..63bbef9d0 --- /dev/null +++ b/internal/CLAUDE.md @@ -0,0 +1,134 @@ +# Internal Backend (Go) + +The Go runtime — `internal/*` packages composed by `internal/daemon`, plus the API transports under `internal/api/*`. ACP subprocess management, SQLite persistence, HTTP/SSE + UDS APIs, autonomy kernel, AGH Network. Entry binary lives in `cmd/agh`. + +Repo-wide rules (Critical Rules, Workflow, Build, Commits, Skill Dispatch, Memory & Skills RFC, CI/Release) live in the **root `CLAUDE.md`**. This file owns architecture, package boundaries, autonomy contracts, security invariants, and `internal/`-specific debugging/forensics. + +## Architecture + +### Principles + +- **Designed for incremental extension** — new capabilities arrive as new packages wired into `daemon/`, without modifying existing packages. Small interfaces + dependency injection. Every capability plan decides which extension points, hooks, capabilities, tools/resources, bundles, registries, bridge SDKs, and docs must be added, updated, or removed. +- **Pragmatic Flat with Discipline** — packages under `internal/`, API transports grouped under `api/`, no domain/infra split, no event bus. +- **`daemon/` is the sole composition root** — the only package that imports all others. Reconciliation logic running at boot belongs to composition root and is not "legacy support". +- **No package imports `daemon/`, `api/`, or `cli/`** — dependencies flow downward only. +- **Interfaces defined where consumed** (Go-style) — `session/` defines `AgentDriver`, `acp/` implements it. +- **Direct function calls through interfaces** — no event bus, no reflection-based routing, no NATS as inter-package coordination. NATS is permitted **only** inside `internal/network` as the embedded wire transport for the AGH Network protocol; daemon packages communicate via interfaces and the Notifier pattern, never by publishing to subjects. +- **Notifier pattern for fan-out** — typed interface for observability and SSE, not a generic bus. +- **No back-pointers between packages** — inject callbacks or interfaces. +- **Functional options for constructors** — `NewManager(opts ...Option)`. +- **Maps for <10 items** — no registry interfaces for small collections. +- **File-level organization** within packages — sub-packages only when complexity justifies it. +- **CI-enforceable boundaries** — `mage Boundaries` rules prevent import cycles. Update `magefile.go` Boundaries() in the same commit that introduces a new `internal/api/*` subpackage. +- **`internal/api/core` is the canonical handler home.** REST/UDS endpoints exist as shared `BaseHandlers` methods; HTTP and UDS only choose registration and authentication. No transport-duplicated parsing/validation. +- **Authoritative primitives are exclusive.** When a primitive owns a state transition (`task.Service.ClaimNextRun`, `Spawn`, `EnsureMigration`), no peer package may replicate it. Wake/observe/sweep are allowed; claim/own is not. The mechanical scheduler does not call `ClaimNextRun`. +- **Hooks are typed dispatch, not an event bus.** Dispatch at the call site that owns the state transition. Never tail event/log tables to fire hooks. Hooks may deny/narrow/annotate but cannot bypass safety primitives (claim tokens, leases, TTL, lineage, spawn caps, permission narrowing). +- **Agent-manageable by default.** User-visible runtime capabilities must expose stable machine-readable control surfaces for agents: CLI verbs with `-o json`/`-o jsonl` where relevant, HTTP/UDS parity when state crosses the daemon boundary, discoverable status/config output, and docs that describe the agent path. UI-only manageability is incomplete. +- **No partial-surface completions.** Any change touching a public surface closes the loop end-to-end in one pass: contract → HTTP handler → UDS handler → CLI client → CLI command → extension/config/docs surfaces → tests → docs. + +### Concurrency + +Generic Go concurrency patterns (goroutine ownership, channels vs mutexes, `select`/`ctx.Done()` discipline, no `time.Sleep` in orchestration) live in `agh-code-guidelines`. Architectural invariants below are load-bearing for design decisions: + +- **Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by Manager-owned WaitGroup and joined in Manager shutdown.** Never put goroutine-owned channels in a struct field that another goroutine mutates — use a per-run handle. +- **Detached execution lifetime.** Any work that outlives an HTTP/UDS request — prompts, network channel sends, automation jobs — MUST detach via `context.WithoutCancel(ctx)`. Never tie execution lifetime to request lifetime. Expose explicit cancel endpoints (e.g., `POST /api/sessions/:id/prompt/cancel`). +- **`context.WithoutCancel` does NOT preserve deadlines.** Re-attach a deadline if needed. +- **Subprocess managed-stop** must respect `ctx.Done()` between Shutdown and Wait. Wrap `proc.Wait()` in `select { case <-proc.Done(): case <-ctx.Done(): }`. +- **Process-group supervision parity.** Unix uses process groups; Windows uses forced-exit fallback. Always cross-build with `GOOS=windows GOARCH=amd64 go build` before claiming subprocess work complete. Centralize signaling helpers in `internal/procutil`. + +### Runtime + +- Single-binary and local-first. Sidecars or external control planes require a written techspec. +- Keep execution paths deterministic and observable. +- **Daemon runs in background by default.** No daemon should require a foreground terminal. +- **`compozy exec` is headless.** `--format text` returns a single string; `--format json` returns a stream of valid JSON objects; the TUI is opt-in via `--tui`. `exec` does not persist artifacts to `.compozy/runs/` unless `--persist` is given. +- **Agent operations must not depend on the web UI.** If agents need to inspect, configure, start, stop, approve, claim, release, or repair a capability, the spec must provide a CLI/HTTP/UDS path with structured output and deterministic errors. + +### Observability + +- Every domain operation emits a canonical event with correlation keys (`workspace_id`, `session_id`, `parent_session_id`, `root_session_id`, `agent_name`, `task_id`, `run_id`, `claim_token_hash`, `lease_until`, `workflow_id`, `coordinator_session_id`, `scheduler_reason`, `hook_event`, `hook_name`, `spawn_depth`, `actor_kind`, `actor_id`, `release_reason`). +- Cover with a coverage matrix test that fails if any required lifecycle path doesn't emit its canonical event. +- Append-only event store (`runtime.db`) is the canonical operational ledger; session DBs are projections, not authority. +- Live broadcasters publish only after durable append; reconnect/replay uses `after_seq`. + +## Security Invariants + +- **`claim_token` redaction is non-negotiable.** Raw `claim_token` (`agh_claim_*`), MCP auth tokens, OAuth codes, PKCE verifiers, and secret bindings MUST NEVER appear in logs, status APIs, settings views, error payloads, channel messages, SSE, web UI, or memory. Use hash forms (`claim_token_hash`) over the wire. Network layer rejects raw `claim_token` in metadata. +- **Symlink escape hardening.** Skill sidecars, skill files, managed-extension dependency copies, and bundle install paths MUST verify resolved targets remain inside approved roots. Use `EvalSymlinks` + path-prefix check, not naive joins. Handle macOS `/private/var/folders` quirk (canonicalize source root before containment check). +- **Path security helpers.** Filesystem helpers resolving user-controlled or agent-controlled paths use the `sanitizePathKey` + `realpathDeepestExisting` pattern (defenses against null-byte, URL-encoded traversal, Unicode normalization, symlink-escape). +- **Identity proof-stripping defense.** In any signed-message processing path (AGH Network v1), an identity in verified format (`nickname@fingerprint`) without valid `proof` MUST classify as `rejected`, not `unverified`. +- **External-call timeouts.** Outbound HTTP/network calls MUST use a client with an explicit timeout. `http.DefaultClient` is forbidden in production code paths. +- **Load-time security scan.** Every non-bundled skill is scanned via `internal/skills.VerifyContent` on every load (not just install). Critical findings block; warning findings log; info findings log silently. Bundled skills are exempt because `go:embed` provides immutability. + +## Package Layout + +| Path | Responsibility | +| ------------------------------- | ----------------------------------------------------------------------------- | +| `cmd/agh` | Main entry point, CLI binary | +| `internal/config` | TOML loading, validation, merge, home paths, agent def parsing | +| `internal/acp` | ACP client: subprocess spawn, JSON-RPC over stdio | +| `internal/agentidentity` | Caller-identity inference from `AGH_SESSION_ID`/`AGH_AGENT` | +| `internal/automation` | Cron, webhook, and scheduled triggers; durable scheduler state | +| `internal/bridges` | External messaging adapters (Slack, Telegram, etc.) | +| `internal/bridgesdk` | Bridge SDK / contract types | +| `internal/bundles` | Bundle activation projector | +| `internal/cli` | Cobra commands | +| `internal/codegen` | OpenAPI → TS generator helpers | +| `internal/coordinator` | Coordinator-agent bootstrap and lifecycle | +| `internal/daemon` | Composition root, lock, boot, shutdown | +| `internal/diagnostics` | Diagnostics + health probes | +| `internal/e2elane` | E2E lane harness wiring | +| `internal/environment` | Env-profile resolution | +| `internal/extension` | Extension manifest, registry, host API, install runtime | +| `internal/extensiontest` | Extension test harness | +| `internal/filesnap` | File snapshot utilities | +| `internal/fileutil` | Shared filesystem helpers | +| `internal/frontmatter` | YAML frontmatter parsing | +| `internal/hooks` | Typed hook taxonomy + dispatch | +| `internal/logger` | Structured logging (slog) | +| `internal/mcp` | MCP server lifecycle / sidecars | +| `internal/memory` | Persistent dual-scope memory (global + workspace + agent), provenance, recall | +| `internal/memory/consolidation` | Dream consolidation runtime (Time → Sessions → Lock gate cascade) | +| `internal/network` | AGH Network channels/peers/wire, NATS profile | +| `internal/observe` | Event recording, health metrics, query engine | +| `internal/procutil` | Process utilities, process-group signaling, Windows fallback | +| `internal/registry` | Skill/agent/capability registry helpers | +| `internal/resources` | Resource projector / codec / validate | +| `internal/retry` | Retry primitives | +| `internal/scheduler` | Mechanical scheduler (idle registry, wakeups, sweep, recovery) | +| `internal/session` | Session lifecycle, Manager, state machine | +| `internal/settings` | Settings overlay/projection | +| `internal/situation` | Situation surface providers (`/agent/context`) | +| `internal/skills` | Skills catalog, loader, `VerifyContent`, MCP/hook decl, provenance | +| `internal/skills/bundled` | Bundled skill definitions | +| `internal/sse` | Shared SSE helpers | +| `internal/store` | SQLite shared helpers, migrations registry, validation | +| `internal/store/globaldb` | Global catalog (`agh.db`): sessions, metadata | +| `internal/store/sessiondb` | Per-session event store (`events.db`) | +| `internal/subprocess` | Subprocess signaling primitives | +| `internal/task` | Task domain, `task_runs` ownership, `ClaimNextRun` | +| `internal/testutil` | Shared test helpers | +| `internal/api/contract` | Shared daemon/CLI/HTTP contract types | +| `internal/api/core` | Shared handler types (`BaseHandlers`), error mapping, SSE helpers | +| `internal/api/httpapi` | HTTP/SSE server (Gin) for web UI | +| `internal/api/udsapi` | UDS server for CLI IPC | +| `internal/api/testutil` | Test helpers for the API layer | +| `internal/toolruntime` | Tool process registry + interrupts | +| `internal/tools` | Tool definitions and dispatch | +| `internal/transcript` | Canonical replay message assembly from persisted events | +| `internal/version` | Build metadata | +| `internal/workref` | Work reference helpers | +| `internal/workspace` | Workspace resolver and entity management | + +## Memory & Skills Runtime (RFC-backed) + +- **Five-layer skill/memory/agent precedence**: Bundled → Marketplace → User → Additional → Workspace, with agent-local overriding all. Higher precedence wins on collision; an audit trail logs every shadow. +- **Memory taxonomy**: `user | feedback | project | reference` types; scopes `agent | workspace | global`. Default write scope declared per agent in `memory.scope`. +- **Memory consolidation gates**: Time → Sessions → Lock cascade ordered by computational cost. Default gates: 24h, 5 touched sessions, file-lock. Never replace gates with naive heuristics. +- **Lifecycle hooks** (`on_session_created`, `on_session_stopped`) execute in hierarchy precedence then alphabetical order; configurable timeout (default 5s); fail-open semantics (errors logged, never block); JSON over stdin. + +## Forensic Bug Fixes + +- **Bug-fix plans open with confirmed reproduction** (timestamp, command, observed evidence) BEFORE listing changes. "I think" or "probably" is forbidden at the top of a fix plan. +- **Inactive metadata repair must distinguish startup-pending from crashed.** Sessions in `m.pending` are still starting, not failed. +- **Stale ACP session ids must be classified, not propagated.** Convert `Resource not found` to fresh-start fallback. diff --git a/internal/agentidentity/identity.go b/internal/agentidentity/identity.go index ee3a33dab..1fb373427 100644 --- a/internal/agentidentity/identity.go +++ b/internal/agentidentity/identity.go @@ -197,6 +197,16 @@ func validateResolveInputs(ctx context.Context, lookup SessionLookup, creds Cred func lookupSessionSnapshot(ctx context.Context, lookup SessionLookup, creds Credentials) (SessionSnapshot, error) { snapshot, err := lookup(ctx, creds.SessionID) if err != nil { + if errors.Is(err, ErrIdentityLookupUnavailable) || + errors.Is(err, context.Canceled) || + errors.Is(err, context.DeadlineExceeded) { + return SessionSnapshot{}, identityError( + ErrIdentityLookupUnavailable, + "identity_lookup_unavailable", + "agent identity cannot be validated", + "retry after the daemon is reachable", + ) + } return SessionSnapshot{}, identityError( ErrIdentityStale, "identity_stale", @@ -274,6 +284,7 @@ func SessionSnapshotFromInfo(info *session.Info) SessionSnapshot { Name: info.Name, AgentName: info.AgentName, Provider: info.Provider, + Model: info.Model, WorkspaceID: info.WorkspaceID, WorkspacePath: info.Workspace, Channel: info.Channel, diff --git a/internal/api/core/agent_channels.go b/internal/api/core/agent_channels.go index 99dbbe2c3..13150ce69 100644 --- a/internal/api/core/agent_channels.go +++ b/internal/api/core/agent_channels.go @@ -87,19 +87,33 @@ func (h *BaseHandlers) AgentChannelRecv(c *gin.Context) { h.respondError(c, http.StatusBadRequest, NewNetworkValidationError(errors.New("channel is required"))) return } + if err := network.ValidateChannel(channel); err != nil { + h.respondError(c, http.StatusBadRequest, NewNetworkValidationError(err)) + return + } + wait, err := parseBoolQuery(c, "wait") + if err != nil { + h.respondError(c, http.StatusBadRequest, NewNetworkValidationError(err)) + return + } + limit, err := parsePositiveIntQuery(c, "limit") + if err != nil { + h.respondError(c, http.StatusBadRequest, NewNetworkValidationError(err)) + return + } envelopes, err := agentChannelInbox( c.Request.Context(), service, caller.Session.ID, channel, - parseBoolQuery(c, "wait"), + wait, ) if err != nil { h.respondError(c, StatusForNetworkError(err), err) return } - messages := agentChannelMessagesFromEnvelopes(envelopes, channel, parsePositiveIntQuery(c, "limit")) + messages := agentChannelMessagesFromEnvelopes(envelopes, channel, limit) c.JSON(http.StatusOK, contract.AgentChannelMessagesResponse{Messages: messages}) } @@ -320,6 +334,9 @@ func (h *BaseHandlers) enrichAgentMePayload( if h == nil { return } + if coordinatorPayload, err := h.agentCoordinatorConfigPayload(ctx, caller.Session.WorkspaceID); err == nil { + payload.Coordinator = coordinatorPayload + } service, err := h.networkServiceRequired() if err != nil { if callerChannel := strings.TrimSpace( @@ -774,6 +791,7 @@ func sessionInfoFromAgentCaller(caller agentidentity.Caller) *session.Info { Name: strings.TrimSpace(caller.Session.Name), AgentName: strings.TrimSpace(caller.Session.AgentName), Provider: strings.TrimSpace(caller.Session.Provider), + Model: strings.TrimSpace(caller.Session.Model), WorkspaceID: strings.TrimSpace(caller.Session.WorkspaceID), Workspace: strings.TrimSpace(caller.Session.WorkspacePath), Channel: strings.TrimSpace(caller.Session.Channel), @@ -793,27 +811,37 @@ func sortCoordinationChannels(channels []contract.CoordinationChannelPayload) { }) } -func parseBoolQuery(c *gin.Context, key string) bool { +func parseBoolQuery(c *gin.Context, key string) (bool, error) { if c == nil { - return false + return false, nil } raw := strings.TrimSpace(c.Query(key)) if raw == "" { - return false + return false, nil } parsed, err := strconv.ParseBool(raw) - return err == nil && parsed + if err != nil { + return false, fmt.Errorf("query parameter %q must be a boolean: %w", key, err) + } + return parsed, nil } -func parsePositiveIntQuery(c *gin.Context, key string) int { +func parsePositiveIntQuery(c *gin.Context, key string) (int, error) { if c == nil { - return 0 + return 0, nil + } + raw := strings.TrimSpace(c.Query(key)) + if raw == "" { + return 0, nil + } + parsed, err := strconv.Atoi(raw) + if err != nil { + return 0, fmt.Errorf("query parameter %q must be a positive integer: %w", key, err) } - parsed, err := strconv.Atoi(strings.TrimSpace(c.Query(key))) - if err != nil || parsed <= 0 { - return 0 + if parsed <= 0 { + return 0, fmt.Errorf("query parameter %q must be a positive integer: %d", key, parsed) } - return parsed + return parsed, nil } func (h *BaseHandlers) nowUTC() time.Time { diff --git a/internal/api/core/agent_channels_internal_test.go b/internal/api/core/agent_channels_internal_test.go index 813936a6d..2d42bb2af 100644 --- a/internal/api/core/agent_channels_internal_test.go +++ b/internal/api/core/agent_channels_internal_test.go @@ -215,11 +215,43 @@ func TestAgentMeCoreHandlerEnrichesContextAndChannels(t *testing.T) { len(response.Me.Capabilities) != 1 || len(response.Me.Channels) != 1 || len(response.Me.ActiveTaskLeases) != 1 || + !response.Me.Coordinator.Enabled || + response.Me.Coordinator.AgentName != "coordinator" || + response.Me.Coordinator.WorkspaceID != "ws-1" || response.Me.Limits.MaxChildren != 2 { t.Fatalf("agent me = %#v, want context and channel enrichment", response.Me) } } +func TestAgentCoordinatorConfigCoreHandlerReturnsResolvedWorkspaceConfig(t *testing.T) { + t.Parallel() + + engine := newAgentCoreTestRouter(t, &agentCoreNetworkService{}) + recorder := performAgentCoreRequest( + t, + engine, + http.MethodGet, + "/agent/coordinator/config", + nil, + agentCoreHeaders(), + ) + if recorder.Code != http.StatusOK { + t.Fatalf("status = %d, want %d; body=%s", recorder.Code, http.StatusOK, recorder.Body.String()) + } + + var response contract.AgentCoordinatorConfigResponse + decodeAgentCoreResponse(t, recorder, &response) + if !response.Coordinator.Enabled || + response.Coordinator.AgentName != "coordinator" || + response.Coordinator.DefaultTTLSeconds != 2700 || + response.Coordinator.MaxChildren != 5 || + response.Coordinator.MaxActivePerWorkspace != 1 || + response.Coordinator.Source != contract.CoordinatorConfigSourceWorkspace || + response.Coordinator.WorkspaceID != "ws-1" { + t.Fatalf("coordinator config = %#v, want resolved workspace config", response.Coordinator) + } +} + func TestAgentChannelCoreHandlersRejectInvalidIdentityAndClaimToken(t *testing.T) { t.Parallel() @@ -324,6 +356,23 @@ func (f agentCoreContextService) ContextForSession( return f(ctx, info) } +type agentCoreCoordinatorConfigResolver struct{} + +func (agentCoreCoordinatorConfigResolver) ResolveCoordinatorConfig( + _ context.Context, + _ string, +) (aghconfig.CoordinatorConfig, error) { + return aghconfig.CoordinatorConfig{ + Enabled: true, + AgentName: "coordinator", + Provider: "codex", + Model: "gpt-4o", + DefaultTTL: 45 * time.Minute, + MaxChildren: 5, + MaxActivePerWorkspace: 1, + }, nil +} + func newAgentCoreTestRouter(t *testing.T, networkService *agentCoreNetworkService) *gin.Engine { t.Helper() @@ -338,6 +387,7 @@ func newAgentCoreTestRouter(t *testing.T, networkService *agentCoreNetworkServic Sessions: agentCoreSessionManager(t), Network: networkService, AgentContextService: agentCoreContextService(agentCoreContextPayload), + CoordinatorConfig: agentCoreCoordinatorConfigResolver{}, Config: cfg, Logger: slog.New(slog.NewTextHandler(io.Discard, nil)), StreamDone: make(chan struct{}), @@ -351,6 +401,7 @@ func newAgentCoreTestRouter(t *testing.T, networkService *agentCoreNetworkServic engine := gin.New() engine.GET("/agent/me", handlers.AgentMe) engine.GET("/agent/context", handlers.AgentContext) + engine.GET("/agent/coordinator/config", handlers.AgentCoordinatorConfig) engine.GET("/agent/channels", handlers.AgentChannels) engine.GET("/agent/channels/:channel/recv", handlers.AgentChannelRecv) engine.POST("/agent/channels/:channel/send", handlers.AgentChannelSend) diff --git a/internal/api/core/agent_identity.go b/internal/api/core/agent_identity.go index e94f37808..f4addc600 100644 --- a/internal/api/core/agent_identity.go +++ b/internal/api/core/agent_identity.go @@ -3,6 +3,7 @@ package core import ( "context" "errors" + "fmt" "net/http" "strings" @@ -11,9 +12,15 @@ import ( "github.com/pedronauck/agh/internal/api/contract" ) -const agentActionMe = "agent.me" +const ( + agentActionMe = "agent.me" + agentActionCoordinatorConfig = "agent.coordinator.config" +) -var errAgentIdentityUnavailable = errors.New("api: session service is not configured") +var ( + errAgentIdentityUnavailable = errors.New("api: session service is not configured") + errCoordinatorConfigMissing = errors.New("api: coordinator config service is not configured") +) // StatusForAgentIdentityError maps agent identity failures to transport statuses. func StatusForAgentIdentityError(err error) int { @@ -45,6 +52,20 @@ func (h *BaseHandlers) AgentMe(c *gin.Context) { c.JSON(http.StatusOK, contract.AgentMeResponse{Me: contract.NormalizeAgentMePayload(payload)}) } +// AgentCoordinatorConfig returns the resolved coordinator policy for the caller workspace. +func (h *BaseHandlers) AgentCoordinatorConfig(c *gin.Context) { + caller, ok := h.requireAgentCaller(c, agentActionCoordinatorConfig) + if !ok { + return + } + payload, err := h.agentCoordinatorConfigPayload(c.Request.Context(), caller.Session.WorkspaceID) + if err != nil { + h.respondError(c, statusForCoordinatorConfigError(err), err) + return + } + c.JSON(http.StatusOK, contract.AgentCoordinatorConfigResponse{Coordinator: payload}) +} + func (h *BaseHandlers) requireAgentCaller( c *gin.Context, action string, @@ -124,3 +145,29 @@ func agentMePayloadFromCaller(caller agentidentity.Caller) contract.AgentMePaylo } return contract.NormalizeAgentMePayload(payload) } + +func (h *BaseHandlers) agentCoordinatorConfigPayload( + ctx context.Context, + workspaceID string, +) (contract.CoordinatorConfigPayload, error) { + if h == nil || h.CoordinatorConfig == nil { + return contract.CoordinatorConfigPayload{}, errCoordinatorConfigMissing + } + trimmedWorkspaceID := strings.TrimSpace(workspaceID) + cfg, err := h.CoordinatorConfig.ResolveCoordinatorConfig(ctx, trimmedWorkspaceID) + if err != nil { + return contract.CoordinatorConfigPayload{}, fmt.Errorf("resolve coordinator config: %w", err) + } + source := contract.CoordinatorConfigSourceGlobal + if trimmedWorkspaceID != "" { + source = contract.CoordinatorConfigSourceWorkspace + } + return CoordinatorConfigPayloadFromConfig(cfg, source, trimmedWorkspaceID), nil +} + +func statusForCoordinatorConfigError(err error) int { + if errors.Is(err, errCoordinatorConfigMissing) { + return http.StatusServiceUnavailable + } + return http.StatusInternalServerError +} diff --git a/internal/api/core/agent_tasks.go b/internal/api/core/agent_tasks.go index 32c50a8ca..d20ea6c7a 100644 --- a/internal/api/core/agent_tasks.go +++ b/internal/api/core/agent_tasks.go @@ -280,10 +280,11 @@ func (h *BaseHandlers) agentTaskClaimCriteria( Kind: taskpkg.ActorKindAgentSession, Ref: strings.TrimSpace(caller.Session.ID), }, - AgentName: strings.TrimSpace(caller.Session.AgentName), - RequiredCapabilities: capabilities, - PriorityMin: req.PriorityMin, - LeaseDuration: leaseDuration, + AgentName: strings.TrimSpace(caller.Session.AgentName), + RequiredCapabilities: capabilities, + PriorityMin: req.PriorityMin, + CoordinationChannelID: strings.TrimSpace(caller.Session.Channel), + LeaseDuration: leaseDuration, }, nil } diff --git a/internal/api/core/conversions_parsers_test.go b/internal/api/core/conversions_parsers_test.go index 30ccac086..1b9fc7ceb 100644 --- a/internal/api/core/conversions_parsers_test.go +++ b/internal/api/core/conversions_parsers_test.go @@ -319,6 +319,9 @@ func TestJobPayloadFromJobCopiesNestedOptionalFields(t *testing.T) { if payload.Schedule == nil || payload.Schedule.Interval != "10m" { t.Fatalf("schedule payload = %#v", payload.Schedule) } + if payload.Schedule == &schedule { + t.Fatal("JobPayloadFromJob reused schedule input pointer") + } if payload.Task == nil || payload.Task.Owner == nil || payload.Task.Owner.Ref != "triage" { t.Fatalf("task payload = %#v", payload.Task) } diff --git a/internal/api/core/handlers.go b/internal/api/core/handlers.go index 5dc12b331..d459c4234 100644 --- a/internal/api/core/handlers.go +++ b/internal/api/core/handlers.go @@ -46,6 +46,7 @@ type BaseHandlerConfig struct { Workspaces WorkspaceService AgentCatalog AgentCatalog AgentContextService AgentContextService + CoordinatorConfig CoordinatorConfigResolver SkillsRegistry SkillsRegistry TaskActorContextResolver TaskActorContextResolver MemoryStore *memory.Store @@ -81,6 +82,7 @@ type BaseHandlers struct { Workspaces WorkspaceService AgentCatalog AgentCatalog AgentContextService AgentContextService + CoordinatorConfig CoordinatorConfigResolver SkillsRegistry SkillsRegistry TaskActorContextResolver TaskActorContextResolver MemoryStore *memory.Store @@ -156,6 +158,7 @@ func NewBaseHandlers(cfg *BaseHandlerConfig) *BaseHandlers { Workspaces: cfg.Workspaces, AgentCatalog: cfg.AgentCatalog, AgentContextService: cfg.AgentContextService, + CoordinatorConfig: cfg.CoordinatorConfig, SkillsRegistry: cfg.SkillsRegistry, TaskActorContextResolver: cfg.TaskActorContextResolver, MemoryStore: cfg.MemoryStore, diff --git a/internal/api/core/interfaces.go b/internal/api/core/interfaces.go index e6921d56b..60f8b9d40 100644 --- a/internal/api/core/interfaces.go +++ b/internal/api/core/interfaces.go @@ -111,6 +111,11 @@ type AgentContextService interface { ContextForSession(ctx context.Context, info *session.Info) (contract.AgentContextPayload, error) } +// CoordinatorConfigResolver resolves safe coordinator policy for agent-facing reads. +type CoordinatorConfigResolver interface { + ResolveCoordinatorConfig(ctx context.Context, workspaceID string) (aghconfig.CoordinatorConfig, error) +} + // NetworkStore exposes persisted network audit, channel metadata CRUD, and timeline queries to the API layer. type NetworkStore interface { ListNetworkAudit(ctx context.Context, query store.NetworkAuditQuery) ([]store.NetworkAuditEntry, error) diff --git a/internal/api/core/tasks_surface_integration_test.go b/internal/api/core/tasks_surface_integration_test.go index 81c2d19ae..d3eddafec 100644 --- a/internal/api/core/tasks_surface_integration_test.go +++ b/internal/api/core/tasks_surface_integration_test.go @@ -285,7 +285,17 @@ func TestExpandedTaskMutationHandlersDelegateIntegration(t *testing.T) { appendCall("publish", actor) executionRequests["publish"] = req taskRecord := taskpkg.Task{ID: id, Scope: taskpkg.ScopeWorkspace, WorkspaceID: "ws-alpha", Title: "Publish", Status: taskpkg.TaskStatusReady, CreatedBy: actor.Actor, Origin: actor.Origin, CreatedAt: now, UpdatedAt: now} - return &taskpkg.Execution{Task: taskRecord, Run: taskpkg.Run{ID: "run-publish", TaskID: id, Status: taskpkg.TaskRunStatusQueued, Attempt: 1, Origin: actor.Origin, QueuedAt: now}}, nil + return &taskpkg.Execution{Task: taskRecord, Run: taskpkg.Run{ + ID: "run-publish", + TaskID: id, + Status: taskpkg.TaskRunStatusQueued, + Attempt: 1, + Origin: actor.Origin, + IdempotencyKey: req.IdempotencyKey, + NetworkChannel: req.NetworkChannel, + Metadata: req.Metadata, + QueuedAt: now, + }}, nil }, StartTaskFn: func( _ context.Context, @@ -296,7 +306,17 @@ func TestExpandedTaskMutationHandlersDelegateIntegration(t *testing.T) { appendCall("start", actor) executionRequests["start"] = req taskRecord := taskpkg.Task{ID: id, Scope: taskpkg.ScopeWorkspace, WorkspaceID: "ws-alpha", Title: "Start", Status: taskpkg.TaskStatusReady, CreatedBy: actor.Actor, Origin: actor.Origin, CreatedAt: now, UpdatedAt: now} - return &taskpkg.Execution{Task: taskRecord, Run: taskpkg.Run{ID: "run-start", TaskID: id, Status: taskpkg.TaskRunStatusQueued, Attempt: 1, Origin: actor.Origin, QueuedAt: now}}, nil + return &taskpkg.Execution{Task: taskRecord, Run: taskpkg.Run{ + ID: "run-start", + TaskID: id, + Status: taskpkg.TaskRunStatusQueued, + Attempt: 1, + Origin: actor.Origin, + IdempotencyKey: req.IdempotencyKey, + NetworkChannel: req.NetworkChannel, + Metadata: req.Metadata, + QueuedAt: now, + }}, nil }, ApproveTaskFn: func( _ context.Context, @@ -307,7 +327,17 @@ func TestExpandedTaskMutationHandlersDelegateIntegration(t *testing.T) { appendCall("approve", actor) executionRequests["approve"] = req taskRecord := taskpkg.Task{ID: id, Scope: taskpkg.ScopeWorkspace, WorkspaceID: "ws-alpha", Title: "Approve", Status: taskpkg.TaskStatusReady, ApprovalPolicy: taskpkg.ApprovalPolicyManual, ApprovalState: taskpkg.ApprovalStateApproved, CreatedBy: actor.Actor, Origin: actor.Origin, CreatedAt: now, UpdatedAt: now} - return &taskpkg.Execution{Task: taskRecord, Run: taskpkg.Run{ID: "run-approve", TaskID: id, Status: taskpkg.TaskRunStatusQueued, Attempt: 1, Origin: actor.Origin, QueuedAt: now}}, nil + return &taskpkg.Execution{Task: taskRecord, Run: taskpkg.Run{ + ID: "run-approve", + TaskID: id, + Status: taskpkg.TaskRunStatusQueued, + Attempt: 1, + Origin: actor.Origin, + IdempotencyKey: req.IdempotencyKey, + NetworkChannel: req.NetworkChannel, + Metadata: req.Metadata, + QueuedAt: now, + }}, nil }, RejectTaskFn: func(_ context.Context, id string, actor taskpkg.ActorContext) (*taskpkg.Task, error) { appendCall("reject", actor) @@ -382,6 +412,41 @@ func TestExpandedTaskMutationHandlersDelegateIntegration(t *testing.T) { if resp.Code != tc.want { t.Fatalf("%s status = %d, want %d; body=%s", tc.path, resp.Code, tc.want, resp.Body.String()) } + switch tc.call { + case "publish", "start", "approve": + var response contract.TaskExecutionResponse + testutil.DecodeJSONResponse(t, resp, &response) + if response.Task.ID != "task-1" || + response.Run.ID != "run-"+tc.call || + response.Run.TaskID != "task-1" || + response.Run.Status != taskpkg.TaskRunStatusQueued || + response.Run.IdempotencyKey != tc.wantKey || + response.Run.NetworkChannel != tc.wantChannel || + string(response.Run.Metadata) != tc.wantMetadata { + t.Fatalf("%s response = %#v, want task/run execution payload", tc.path, response) + } + case "reject": + var response contract.TaskResponse + testutil.DecodeJSONResponse(t, resp, &response) + if response.Task.ID != "task-1" || + response.Task.ApprovalState != taskpkg.ApprovalStateRejected { + t.Fatalf("%s response = %#v, want rejected task payload", tc.path, response) + } + case "read", "archive", "dismiss": + var response contract.TaskTriageStateResponse + testutil.DecodeJSONResponse(t, resp, &response) + if response.Triage.TaskID != "task-1" || + response.Triage.Actor.Ref != "user-1" { + t.Fatalf("%s response = %#v, want actor triage payload", tc.path, response) + } + if (tc.call == "read" && !response.Triage.Read) || + (tc.call == "archive" && !response.Triage.Archived) || + (tc.call == "dismiss" && !response.Triage.Dismissed) { + t.Fatalf("%s triage response = %#v, want %s state", tc.path, response.Triage, tc.call) + } + default: + t.Fatalf("unhandled mutation case %q", tc.call) + } if tc.wantKey == "" { return } diff --git a/internal/api/spec/spec.go b/internal/api/spec/spec.go index ca850f553..bfe97ae6f 100644 --- a/internal/api/spec/spec.go +++ b/internal/api/spec/spec.go @@ -1219,7 +1219,13 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentMeResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Caller session not found", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1233,7 +1239,13 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentContextResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Caller session not found", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1247,6 +1259,12 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentChannelsResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1264,9 +1282,16 @@ var operationRegistry = []OperationSpec{ }, Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentChannelMessagesResponse{}}, + {Status: 400, Description: "Invalid channel receive query", Body: contract.ErrorPayload{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Coordination channel not found", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid channel receive request", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1284,8 +1309,14 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 202, Description: "Accepted", Body: contract.AgentChannelMessageResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Coordination channel not found", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid channel send request", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1300,8 +1331,14 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 202, Description: "Accepted", Body: contract.AgentChannelMessageResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Coordination message not found", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid channel reply request", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1317,8 +1354,14 @@ var operationRegistry = []OperationSpec{ {Status: 200, Description: "OK", Body: contract.AgentTaskClaimResponse{}}, {Status: 204, Description: "No matching task run is currently claimable"}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 409, Description: "Task-run claim conflict", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid claim criteria", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1336,9 +1379,15 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentTaskLeaseResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Task run not found", Body: contract.ErrorPayload{}}, {Status: 409, Description: "Task-run lease conflict", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid heartbeat request", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1356,9 +1405,15 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentTaskLeaseResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Task run not found", Body: contract.ErrorPayload{}}, {Status: 409, Description: "Task-run completion conflict", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid completion request", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1376,9 +1431,15 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentTaskLeaseResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Task run not found", Body: contract.ErrorPayload{}}, {Status: 409, Description: "Task-run failure conflict", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid failure request", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1396,9 +1457,15 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentTaskLeaseResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Task run not found", Body: contract.ErrorPayload{}}, {Status: 409, Description: "Task-run release conflict", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid release request", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1416,6 +1483,11 @@ var operationRegistry = []OperationSpec{ {Status: 403, Description: "Spawn permission denied", Body: contract.ErrorPayload{}}, {Status: 409, Description: "Spawn limit conflict", Body: contract.ErrorPayload{}}, {Status: 422, Description: "Invalid spawn request", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, @@ -1432,7 +1504,13 @@ var operationRegistry = []OperationSpec{ Responses: []ResponseSpec{ {Status: 200, Description: "OK", Body: contract.AgentCoordinatorConfigResponse{}}, {Status: 401, Description: "Agent caller identity is missing", Body: contract.ErrorPayload{}}, + {Status: 403, Description: "Forbidden - workspace or permission mismatch", Body: contract.ErrorPayload{}}, {Status: 404, Description: "Workspace not found", Body: contract.ErrorPayload{}}, + { + Status: 503, + Description: "Service unavailable - dependent service missing", + Body: contract.ErrorPayload{}, + }, {Status: 500, Description: "Internal server error", Body: contract.ErrorPayload{}}, }, }, diff --git a/internal/api/udsapi/agent_channels_test.go b/internal/api/udsapi/agent_channels_test.go index 9853e4d42..c91457038 100644 --- a/internal/api/udsapi/agent_channels_test.go +++ b/internal/api/udsapi/agent_channels_test.go @@ -11,6 +11,7 @@ import ( "github.com/pedronauck/agh/internal/agentidentity" "github.com/pedronauck/agh/internal/api/contract" + aghconfig "github.com/pedronauck/agh/internal/config" "github.com/pedronauck/agh/internal/network" "github.com/pedronauck/agh/internal/session" ) @@ -68,6 +69,56 @@ func TestAgentContextReturnsSituationPayload(t *testing.T) { } } +func TestAgentCoordinatorConfigRouteReturnsResolvedPayload(t *testing.T) { + t.Parallel() + + t.Run("Should return resolved workspace coordinator payload", func(t *testing.T) { + t.Parallel() + + manager := activeAgentSessionManager(t) + handlers := newTestHandlers(t, manager, stubObserver{}, newTestHomePaths(t)) + handlers.CoordinatorConfig = agentCoordinatorConfigResolverFunc( + func(_ context.Context, workspaceID string) (aghconfig.CoordinatorConfig, error) { + if workspaceID != "ws-1" { + t.Fatalf("ResolveCoordinatorConfig() workspaceID = %q, want ws-1", workspaceID) + } + return aghconfig.CoordinatorConfig{ + Enabled: true, + AgentName: "coordinator", + Provider: "codex", + Model: "gpt-4o", + DefaultTTL: 45 * time.Minute, + MaxChildren: 5, + MaxActivePerWorkspace: 1, + }, nil + }, + ) + engine := newTestRouter(t, handlers) + + recorder := performAgentKernelRequest( + t, + engine, + http.MethodGet, + "/api/agent/coordinator/config", + nil, + agentKernelHeaders(), + ) + if recorder.Code != http.StatusOK { + t.Fatalf("status = %d, want %d; body=%s", recorder.Code, http.StatusOK, recorder.Body.String()) + } + + var response contract.AgentCoordinatorConfigResponse + decodeJSONResponse(t, recorder, &response) + if !response.Coordinator.Enabled || + response.Coordinator.AgentName != "coordinator" || + response.Coordinator.DefaultTTLSeconds != 2700 || + response.Coordinator.Source != contract.CoordinatorConfigSourceWorkspace || + response.Coordinator.WorkspaceID != "ws-1" { + t.Fatalf("coordinator = %#v, want workspace coordinator payload", response.Coordinator) + } + }) +} + func TestAgentChannelSendUsesCallerIdentityAndRejectsRawClaimToken(t *testing.T) { t.Parallel() @@ -336,6 +387,15 @@ func (f agentContextServiceFunc) ContextForSession( return f(ctx, info) } +type agentCoordinatorConfigResolverFunc func(context.Context, string) (aghconfig.CoordinatorConfig, error) + +func (f agentCoordinatorConfigResolverFunc) ResolveCoordinatorConfig( + ctx context.Context, + workspaceID string, +) (aghconfig.CoordinatorConfig, error) { + return f(ctx, workspaceID) +} + func newAgentChannelHandlers(t *testing.T, networkService stubNetworkService) *Handlers { t.Helper() diff --git a/internal/api/udsapi/agent_identity_test.go b/internal/api/udsapi/agent_identity_test.go index fb0ff6cc2..986137ef3 100644 --- a/internal/api/udsapi/agent_identity_test.go +++ b/internal/api/udsapi/agent_identity_test.go @@ -31,16 +31,17 @@ func TestAgentMeRejectsInvalidCallerIdentity(t *testing.T) { name string headers map[string]string statusInfo *session.Info + statusErr error wantStatus int }{ { - name: "missing env headers", + name: "Should reject missing env headers", headers: map[string]string{}, statusInfo: active, wantStatus: http.StatusUnauthorized, }, { - name: "stopped session", + name: "Should reject stopped sessions", headers: map[string]string{ agentidentity.HeaderSessionID: "sess-1", agentidentity.HeaderAgent: "coder", @@ -53,7 +54,7 @@ func TestAgentMeRejectsInvalidCallerIdentity(t *testing.T) { wantStatus: http.StatusUnauthorized, }, { - name: "agent mismatch", + name: "Should reject agent mismatches", headers: map[string]string{ agentidentity.HeaderSessionID: "sess-1", agentidentity.HeaderAgent: "reviewer", @@ -62,7 +63,7 @@ func TestAgentMeRejectsInvalidCallerIdentity(t *testing.T) { wantStatus: http.StatusUnauthorized, }, { - name: "workspace mismatch", + name: "Should reject workspace mismatches", headers: map[string]string{ agentidentity.HeaderSessionID: "sess-1", agentidentity.HeaderAgent: "coder", @@ -71,6 +72,15 @@ func TestAgentMeRejectsInvalidCallerIdentity(t *testing.T) { statusInfo: active, wantStatus: http.StatusForbidden, }, + { + name: "Should preserve lookup unavailable status", + headers: map[string]string{ + agentidentity.HeaderSessionID: "sess-1", + agentidentity.HeaderAgent: "coder", + }, + statusErr: context.DeadlineExceeded, + wantStatus: http.StatusServiceUnavailable, + }, } for _, tt := range tests { @@ -79,6 +89,9 @@ func TestAgentMeRejectsInvalidCallerIdentity(t *testing.T) { manager := stubSessionManager{ StatusFn: func(_ context.Context, id string) (*session.Info, error) { + if tt.statusErr != nil { + return nil, tt.statusErr + } if tt.statusInfo == nil { return nil, session.ErrSessionNotFound } @@ -105,49 +118,56 @@ func TestAgentMeRejectsInvalidCallerIdentity(t *testing.T) { func TestAgentMeReturnsValidatedCallerIdentity(t *testing.T) { t.Parallel() - manager := stubSessionManager{ - StatusFn: func(_ context.Context, id string) (*session.Info, error) { - if id != "sess-1" { - return nil, session.ErrSessionNotFound - } - now := time.Date(2026, 4, 26, 10, 0, 0, 0, time.UTC) - return &session.Info{ - ID: "sess-1", - Name: "worker", - AgentName: "coder", - Provider: "test-provider", - WorkspaceID: "ws-1", - Workspace: "/workspace", - Channel: "coord", - Type: session.SessionTypeUser, - State: session.StateActive, - CreatedAt: now, - UpdatedAt: now, - }, nil - }, - } - engine := newTestRouter(t, newTestHandlers(t, manager, stubObserver{}, newTestHomePaths(t))) - recorder := performAgentMeRequest(t, engine, map[string]string{ - agentidentity.HeaderSessionID: "sess-1", - agentidentity.HeaderAgent: "coder", - agentidentity.HeaderWorkspaceID: "ws-1", - }) - if recorder.Code != http.StatusOK { - t.Fatalf("status = %d, want %d; body=%s", recorder.Code, http.StatusOK, recorder.Body.String()) - } + t.Run("Should return validated caller identity", func(t *testing.T) { + t.Parallel() - var response contract.AgentMeResponse - decodeJSONResponse(t, recorder, &response) - if response.Me.Self.SessionID != "sess-1" || response.Me.Self.AgentName != "coder" { - t.Fatalf("response.Me.Self = %#v, want validated caller", response.Me.Self) - } - if response.Me.Session.State != session.StateActive || response.Me.Workspace.ID != "ws-1" { - encoded, err := json.Marshal(response.Me) - if err != nil { - t.Fatalf("json.Marshal(AgentMePayload) error = %v", err) + manager := stubSessionManager{ + StatusFn: func(_ context.Context, id string) (*session.Info, error) { + if id != "sess-1" { + return nil, session.ErrSessionNotFound + } + now := time.Date(2026, 4, 26, 10, 0, 0, 0, time.UTC) + return &session.Info{ + ID: "sess-1", + Name: "worker", + AgentName: "coder", + Provider: "test-provider", + Model: "test-model", + WorkspaceID: "ws-1", + Workspace: "/workspace", + Channel: "coord", + Type: session.SessionTypeUser, + State: session.StateActive, + CreatedAt: now, + UpdatedAt: now, + }, nil + }, } - t.Fatalf("response.Me = %s, want active session in workspace ws-1", encoded) - } + engine := newTestRouter(t, newTestHandlers(t, manager, stubObserver{}, newTestHomePaths(t))) + recorder := performAgentMeRequest(t, engine, map[string]string{ + agentidentity.HeaderSessionID: "sess-1", + agentidentity.HeaderAgent: "coder", + agentidentity.HeaderWorkspaceID: "ws-1", + }) + if recorder.Code != http.StatusOK { + t.Fatalf("status = %d, want %d; body=%s", recorder.Code, http.StatusOK, recorder.Body.String()) + } + + var response contract.AgentMeResponse + decodeJSONResponse(t, recorder, &response) + if response.Me.Self.SessionID != "sess-1" || + response.Me.Self.AgentName != "coder" || + response.Me.Self.Model != "test-model" { + t.Fatalf("response.Me.Self = %#v, want validated caller with model", response.Me.Self) + } + if response.Me.Session.State != session.StateActive || response.Me.Workspace.ID != "ws-1" { + encoded, err := json.Marshal(response.Me) + if err != nil { + t.Fatalf("json.Marshal(AgentMePayload) error = %v", err) + } + t.Fatalf("response.Me = %s, want active session in workspace ws-1", encoded) + } + }) } func TestAgentMeReportsUnavailableWhenSessionServiceMissing(t *testing.T) { diff --git a/internal/api/udsapi/agent_tasks_test.go b/internal/api/udsapi/agent_tasks_test.go index 591956cc4..e159e6b3a 100644 --- a/internal/api/udsapi/agent_tasks_test.go +++ b/internal/api/udsapi/agent_tasks_test.go @@ -101,9 +101,10 @@ func TestAgentTaskClaimNextUsesCallerIdentityAndReturnsCoordinationChannel(t *te if seenCriteria.WorkspaceID != "ws-1" || seenCriteria.ClaimerSessionID != "sess-agent" || seenCriteria.AgentName != "coder" || + seenCriteria.CoordinationChannelID != "builders" || seenCriteria.PriorityMin != 2 || seenCriteria.LeaseDuration != 120*time.Second { - t.Fatalf("criteria = %#v, want caller workspace/session/agent and flags", seenCriteria) + t.Fatalf("criteria = %#v, want caller workspace/session/agent/channel and flags", seenCriteria) } if !containsString(seenCriteria.RequiredCapabilities, "manual") || !containsString(seenCriteria.RequiredCapabilities, "go") { diff --git a/internal/api/udsapi/handlers_test.go b/internal/api/udsapi/handlers_test.go index 354999a77..2a58b6012 100644 --- a/internal/api/udsapi/handlers_test.go +++ b/internal/api/udsapi/handlers_test.go @@ -18,6 +18,7 @@ import ( core "github.com/pedronauck/agh/internal/api/core" aghconfig "github.com/pedronauck/agh/internal/config" hookspkg "github.com/pedronauck/agh/internal/hooks" + "github.com/pedronauck/agh/internal/network" "github.com/pedronauck/agh/internal/observe" "github.com/pedronauck/agh/internal/session" settingspkg "github.com/pedronauck/agh/internal/settings" @@ -104,6 +105,7 @@ func TestRegisterRoutesCoversTechSpecEndpoints(t *testing.T) { "GET /api/agent/channels", "GET /api/agent/channels/:channel/recv", "GET /api/agent/context", + "GET /api/agent/coordinator/config", "GET /api/agent/me", "GET /api/automation/jobs", "GET /api/automation/jobs/:id", @@ -499,24 +501,34 @@ func TestRegisterTaskRoutesUseSharedHandlerBindings(t *testing.T) { engine := newTestRouter(t, newTestHandlers(t, stubSessionManager{}, stubObserver{}, homePaths)) expectedHandlers := map[string]string{ - "GET /api/observe/tasks/dashboard": "TaskDashboard", - "GET /api/observe/tasks/inbox": "TaskInbox", - "GET /api/task-runs/:id": "GetTaskRun", - "GET /api/tasks/:id/stream": "StreamTask", - "GET /api/tasks/:id/timeline": "TaskTimeline", - "GET /api/tasks/:id/tree": "TaskTree", - "POST /api/agent/channels/reply": "AgentChannelReply", - "POST /api/agent/tasks/:run_id/complete": "AgentTaskComplete", - "POST /api/agent/tasks/claim-next": "AgentTaskClaimNext", - "DELETE /api/tasks/:id": "DeleteTask", - "POST /api/sessions/:id/stop": "StopSession", - "POST /api/tasks/:id/approve": "ApproveTask", - "POST /api/tasks/:id/publish": "PublishTask", - "POST /api/tasks/:id/reject": "RejectTask", - "POST /api/tasks/:id/start": "StartTask", - "POST /api/tasks/:id/triage/archive": "ArchiveTask", - "POST /api/tasks/:id/triage/dismiss": "DismissTask", - "POST /api/tasks/:id/triage/read": "MarkTaskRead", + "GET /api/observe/tasks/dashboard": "TaskDashboard", + "GET /api/observe/tasks/inbox": "TaskInbox", + "GET /api/task-runs/:id": "GetTaskRun", + "GET /api/tasks/:id/stream": "StreamTask", + "GET /api/tasks/:id/timeline": "TaskTimeline", + "GET /api/tasks/:id/tree": "TaskTree", + "GET /api/agent/channels": "AgentChannels", + "GET /api/agent/channels/:channel/recv": "AgentChannelRecv", + "GET /api/agent/context": "AgentContext", + "GET /api/agent/coordinator/config": "AgentCoordinatorConfig", + "GET /api/agent/me": "AgentMe", + "POST /api/agent/channels/:channel/send": "AgentChannelSend", + "POST /api/agent/channels/reply": "AgentChannelReply", + "POST /api/agent/tasks/:run_id/complete": "AgentTaskComplete", + "POST /api/agent/tasks/:run_id/fail": "AgentTaskFail", + "POST /api/agent/tasks/:run_id/heartbeat": "AgentTaskHeartbeat", + "POST /api/agent/tasks/:run_id/release": "AgentTaskRelease", + "POST /api/agent/tasks/claim-next": "AgentTaskClaimNext", + "POST /api/agent/spawn": "AgentSpawn", + "DELETE /api/tasks/:id": "DeleteTask", + "POST /api/sessions/:id/stop": "StopSession", + "POST /api/tasks/:id/approve": "ApproveTask", + "POST /api/tasks/:id/publish": "PublishTask", + "POST /api/tasks/:id/reject": "RejectTask", + "POST /api/tasks/:id/start": "StartTask", + "POST /api/tasks/:id/triage/archive": "ArchiveTask", + "POST /api/tasks/:id/triage/dismiss": "DismissTask", + "POST /api/tasks/:id/triage/read": "MarkTaskRead", } routes := engine.Routes() @@ -538,6 +550,65 @@ func TestRegisterTaskRoutesUseSharedHandlerBindings(t *testing.T) { } } } + +func TestAgentChannelRecvRejectsInvalidPathAndQuery(t *testing.T) { + t.Parallel() + + tests := []struct { + name string + path string + }{ + { + name: "Should reject malformed channel identifiers before reading inbox", + path: "/api/agent/channels/bad.channel/recv", + }, + { + name: "Should reject malformed wait query values before reading inbox", + path: "/api/agent/channels/builders/recv?wait=maybe", + }, + { + name: "Should reject malformed limit query values before reading inbox", + path: "/api/agent/channels/builders/recv?limit=abc", + }, + { + name: "Should reject non-positive limit query values before reading inbox", + path: "/api/agent/channels/builders/recv?limit=0", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + t.Parallel() + + handlers := newAgentChannelHandlers(t, stubNetworkService{ + InboxFn: func(context.Context, string) ([]network.Envelope, error) { + t.Fatal("Inbox should not be called for invalid receive requests") + return nil, nil + }, + WaitInboxFn: func(context.Context, string, string) ([]network.Envelope, error) { + t.Fatal("WaitInbox should not be called for invalid receive requests") + return nil, nil + }, + }) + recorder := performAgentKernelRequest( + t, + newTestRouter(t, handlers), + http.MethodGet, + tt.path, + nil, + agentKernelHeaders(), + ) + if recorder.Code != http.StatusBadRequest { + t.Fatalf("status = %d, want %d; body=%s", recorder.Code, http.StatusBadRequest, recorder.Body.String()) + } + var payload contract.ErrorPayload + decodeJSONResponse(t, recorder, &payload) + if payload.Error == "" { + t.Fatalf("error payload = %#v, want validation error", payload) + } + }) + } +} func TestCreateSessionHandlerReturnsSessionID(t *testing.T) { homePaths := newTestHomePaths(t) manager := stubSessionManager{ diff --git a/internal/api/udsapi/routes.go b/internal/api/udsapi/routes.go index 32195cb47..b2f2cd558 100644 --- a/internal/api/udsapi/routes.go +++ b/internal/api/udsapi/routes.go @@ -91,6 +91,7 @@ func registerAgentKernelRoutes(api gin.IRouter, handlers *Handlers) { { agent.GET("/me", handlers.AgentMe) agent.GET("/context", handlers.AgentContext) + agent.GET("/coordinator/config", handlers.AgentCoordinatorConfig) agent.POST("/spawn", handlers.AgentSpawn) agent.GET("/channels", handlers.AgentChannels) agent.GET("/channels/:channel/recv", handlers.AgentChannelRecv) diff --git a/internal/api/udsapi/server.go b/internal/api/udsapi/server.go index 7e9d88a61..605a9c34a 100644 --- a/internal/api/udsapi/server.go +++ b/internal/api/udsapi/server.go @@ -50,32 +50,33 @@ type ExtensionService interface { type Server struct { mu sync.Mutex - homePaths aghconfig.HomePaths - config aghconfig.Config - socketPath string - logger *slog.Logger - startedAt time.Time - now func() time.Time - pollInterval time.Duration - sessions core.SessionManager - tasks core.TaskService - network core.NetworkService - networkStore core.NetworkStore - observer core.Observer - resources core.ResourceService - automation core.AutomationManager - bridges core.BridgeService - bundles core.BundleService - settings core.SettingsService - settingsRestart core.SettingsRestartController - workspaces core.WorkspaceService - agentCatalog core.AgentCatalog - agentContext core.AgentContextService - skillsRegistry core.SkillsRegistry - memoryStore *memory.Store - dreamTrigger core.DreamTrigger - agentLoader core.AgentLoader - extensions ExtensionService + homePaths aghconfig.HomePaths + config aghconfig.Config + socketPath string + logger *slog.Logger + startedAt time.Time + now func() time.Time + pollInterval time.Duration + sessions core.SessionManager + tasks core.TaskService + network core.NetworkService + networkStore core.NetworkStore + observer core.Observer + resources core.ResourceService + automation core.AutomationManager + bridges core.BridgeService + bundles core.BundleService + settings core.SettingsService + settingsRestart core.SettingsRestartController + workspaces core.WorkspaceService + agentCatalog core.AgentCatalog + agentContext core.AgentContextService + coordinatorConfig core.CoordinatorConfigResolver + skillsRegistry core.SkillsRegistry + memoryStore *memory.Store + dreamTrigger core.DreamTrigger + agentLoader core.AgentLoader + extensions ExtensionService engine *gin.Engine handlers *Handlers @@ -88,31 +89,32 @@ type Server struct { } type handlerConfig struct { - sessions core.SessionManager - tasks core.TaskService - network core.NetworkService - networkStore core.NetworkStore - observer core.Observer - resources core.ResourceService - automation core.AutomationManager - bridges core.BridgeService - bundles core.BundleService - settings core.SettingsService - settingsRestart core.SettingsRestartController - workspaces core.WorkspaceService - agentCatalog core.AgentCatalog - agentContext core.AgentContextService - skillsRegistry core.SkillsRegistry - memoryStore *memory.Store - dreamTrigger core.DreamTrigger - homePaths aghconfig.HomePaths - config aghconfig.Config - logger *slog.Logger - startedAt time.Time - now func() time.Time - pollInterval time.Duration - agentLoader core.AgentLoader - extensions ExtensionService + sessions core.SessionManager + tasks core.TaskService + network core.NetworkService + networkStore core.NetworkStore + observer core.Observer + resources core.ResourceService + automation core.AutomationManager + bridges core.BridgeService + bundles core.BundleService + settings core.SettingsService + settingsRestart core.SettingsRestartController + workspaces core.WorkspaceService + agentCatalog core.AgentCatalog + agentContext core.AgentContextService + coordinatorConfig core.CoordinatorConfigResolver + skillsRegistry core.SkillsRegistry + memoryStore *memory.Store + dreamTrigger core.DreamTrigger + homePaths aghconfig.HomePaths + config aghconfig.Config + logger *slog.Logger + startedAt time.Time + now func() time.Time + pollInterval time.Duration + agentLoader core.AgentLoader + extensions ExtensionService } // Handlers expose request/response and SSE endpoints for the AGH API. @@ -285,6 +287,13 @@ func WithAgentContext(service core.AgentContextService) Option { } } +// WithCoordinatorConfig injects the resolved coordinator policy reader. +func WithCoordinatorConfig(resolver core.CoordinatorConfigResolver) Option { + return func(server *Server) { + server.coordinatorConfig = resolver + } +} + // WithDreamTrigger injects the dream-consolidation trigger surfaced by the daemon. func WithDreamTrigger(trigger core.DreamTrigger) Option { return func(server *Server) { @@ -421,31 +430,32 @@ func (s *Server) ensureEngine() { func (s *Server) handlerConfig() *handlerConfig { return &handlerConfig{ - sessions: s.sessions, - tasks: s.tasks, - network: s.network, - networkStore: s.networkStore, - observer: s.observer, - resources: s.resources, - automation: s.automation, - bridges: s.bridges, - bundles: s.bundles, - settings: s.settings, - settingsRestart: s.settingsRestart, - workspaces: s.workspaces, - agentCatalog: s.agentCatalog, - agentContext: s.agentContext, - skillsRegistry: s.skillsRegistry, - memoryStore: s.memoryStore, - dreamTrigger: s.dreamTrigger, - homePaths: s.homePaths, - config: s.config, - logger: s.logger, - startedAt: s.startedAt, - now: s.now, - pollInterval: s.pollInterval, - agentLoader: s.agentLoader, - extensions: s.extensions, + sessions: s.sessions, + tasks: s.tasks, + network: s.network, + networkStore: s.networkStore, + observer: s.observer, + resources: s.resources, + automation: s.automation, + bridges: s.bridges, + bundles: s.bundles, + settings: s.settings, + settingsRestart: s.settingsRestart, + workspaces: s.workspaces, + agentCatalog: s.agentCatalog, + agentContext: s.agentContext, + coordinatorConfig: s.coordinatorConfig, + skillsRegistry: s.skillsRegistry, + memoryStore: s.memoryStore, + dreamTrigger: s.dreamTrigger, + homePaths: s.homePaths, + config: s.config, + logger: s.logger, + startedAt: s.startedAt, + now: s.now, + pollInterval: s.pollInterval, + agentLoader: s.agentLoader, + extensions: s.extensions, } } @@ -663,6 +673,7 @@ func newHandlers(cfg *handlerConfig) *Handlers { Workspaces: cfg.workspaces, AgentCatalog: cfg.agentCatalog, AgentContextService: cfg.agentContext, + CoordinatorConfig: cfg.coordinatorConfig, SkillsRegistry: cfg.skillsRegistry, MemoryStore: cfg.memoryStore, DreamTrigger: cfg.dreamTrigger, diff --git a/internal/automation/dispatch.go b/internal/automation/dispatch.go index f64a85a76..4d3fe982a 100644 --- a/internal/automation/dispatch.go +++ b/internal/automation/dispatch.go @@ -53,7 +53,10 @@ func (e *FireLimitError) Unwrap() error { return ErrFireLimitReached } -const dispatcherSessionStopTimeout = 2 * time.Second +// defaultDispatcherSessionStopTimeout must outlive the ACP driver's own +// graceful stop budget, otherwise automation runs can be marked failed while +// the session is still finishing a normal shutdown. +const defaultDispatcherSessionStopTimeout = 10 * time.Second // DispatchKind identifies which activation path produced a dispatch request. type DispatchKind string @@ -238,6 +241,7 @@ type Dispatcher struct { sleep SleepFunc globalWorkspacePath string maxConcurrent int + sessionStopTimeout time.Duration hooks HookDispatcher taskActors SessionTaskActorRecorder @@ -255,12 +259,13 @@ func NewDispatcher(sessions SessionCreator, runs RunStore, opts ...DispatcherOpt } dispatcher := &Dispatcher{ - sessions: sessions, - runs: runs, - logger: slog.Default(), - now: func() time.Time { return time.Now().UTC() }, - sleep: sleepWithContext, - maxConcurrent: DefaultMaxConcurrentJobs, + sessions: sessions, + runs: runs, + logger: slog.Default(), + now: func() time.Time { return time.Now().UTC() }, + sleep: sleepWithContext, + maxConcurrent: DefaultMaxConcurrentJobs, + sessionStopTimeout: defaultDispatcherSessionStopTimeout, } for _, opt := range opts { @@ -284,6 +289,9 @@ func NewDispatcher(sessions SessionCreator, runs RunStore, opts ...DispatcherOpt if dispatcher.maxConcurrent <= 0 { dispatcher.maxConcurrent = DefaultMaxConcurrentJobs } + if dispatcher.sessionStopTimeout <= 0 { + dispatcher.sessionStopTimeout = defaultDispatcherSessionStopTimeout + } dispatcher.gate = make(chan struct{}, dispatcher.maxConcurrent) return dispatcher, nil @@ -324,6 +332,13 @@ func WithDispatcherMaxConcurrent(limit int) DispatcherOption { } } +// WithDispatcherSessionStopTimeout overrides the automation session stop budget. +func WithDispatcherSessionStopTimeout(timeout time.Duration) DispatcherOption { + return func(dispatcher *Dispatcher) { + dispatcher.sessionStopTimeout = timeout + } +} + // WithDispatcherHooks injects the automation lifecycle hook dispatcher. func WithDispatcherHooks(hooks HookDispatcher) DispatcherOption { return func(dispatcher *Dispatcher) { @@ -802,7 +817,7 @@ func (d *Dispatcher) stopAutomationSession( return nil } - stopCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), dispatcherSessionStopTimeout) + stopCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), d.sessionStopTimeout) defer cancel() cause, detail := dispatchStopCause(status, runErr) diff --git a/internal/automation/dispatch_stop_timeout_test.go b/internal/automation/dispatch_stop_timeout_test.go new file mode 100644 index 000000000..405b1d8b6 --- /dev/null +++ b/internal/automation/dispatch_stop_timeout_test.go @@ -0,0 +1,57 @@ +package automation + +import ( + "testing" + "time" + + "github.com/pedronauck/agh/internal/testutil" +) + +func TestDispatcherSessionStopTimeout(t *testing.T) { + t.Run("Should keep completed runs successful while session teardown finishes", func(t *testing.T) { + t.Parallel() + + store := newMemoryRunStore() + stopStarted := make(chan struct{}, 1) + stopRelease := make(chan struct{}) + creator := newRecordingSessionCreator(sessionAttemptPlan{ + stopStarted: stopStarted, + stopRelease: stopRelease, + }) + dispatcher := newTestDispatcher(t, creator, store) + job := testJob(AutomationScopeGlobal, "job-slow-stop", "") + + go func() { + <-stopStarted + timer := time.NewTimer(3 * time.Second) + defer timer.Stop() + <-timer.C + close(stopRelease) + }() + + run, err := dispatcher.Dispatch(testutil.Context(t), DispatchRequest{ + Kind: DispatchKindManual, + Job: &job, + }) + if err != nil { + t.Fatalf("Dispatch() error = %v", err) + } + if got, want := run.Status, RunCompleted; got != want { + t.Fatalf("run.Status = %q, want %q", got, want) + } + + reloadedRuns, err := store.ListRuns(testutil.Context(t), RunQuery{JobID: job.ID}) + if err != nil { + t.Fatalf("ListRuns() error = %v", err) + } + if got, want := len(reloadedRuns), 1; got != want { + t.Fatalf("len(ListRuns()) = %d, want %d", got, want) + } + if got, want := reloadedRuns[0].Status, RunCompleted; got != want { + t.Fatalf("ListRuns()[0].Status = %q, want %q", got, want) + } + if reloadedRuns[0].EndedAt == nil { + t.Fatal("ListRuns()[0].EndedAt = nil, want populated") + } + }) +} diff --git a/internal/automation/dispatch_test.go b/internal/automation/dispatch_test.go index 9026221c9..c0cb448c9 100644 --- a/internal/automation/dispatch_test.go +++ b/internal/automation/dispatch_test.go @@ -796,6 +796,49 @@ func TestNewDispatcherRejectsMissingDependenciesAndGlobalWorkspacePath(t *testin } } +func TestNewDispatcherSessionStopTimeoutOption(t *testing.T) { + t.Parallel() + + testCases := []struct { + name string + opt DispatcherOption + want time.Duration + }{ + { + name: "Should use the default session stop timeout", + want: defaultDispatcherSessionStopTimeout, + }, + { + name: "Should use the configured session stop timeout", + opt: WithDispatcherSessionStopTimeout(30 * time.Second), + want: 30 * time.Second, + }, + { + name: "Should fall back to default for invalid session stop timeout", + opt: WithDispatcherSessionStopTimeout(0), + want: defaultDispatcherSessionStopTimeout, + }, + } + + for _, tc := range testCases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + + options := []DispatcherOption{WithDispatcherGlobalWorkspacePath(t.TempDir())} + if tc.opt != nil { + options = append(options, tc.opt) + } + dispatcher, err := NewDispatcher(newRecordingSessionCreator(), newMemoryRunStore(), options...) + if err != nil { + t.Fatalf("NewDispatcher() error = %v", err) + } + if dispatcher.sessionStopTimeout != tc.want { + t.Fatalf("sessionStopTimeout = %s, want %s", dispatcher.sessionStopTimeout, tc.want) + } + }) + } +} + func TestDispatchRequestValidateRejectsInvalidShapes(t *testing.T) { t.Parallel() @@ -1218,6 +1261,9 @@ type sessionAttemptPlan struct { promptErr error promptStarted chan struct{} promptRelease chan struct{} + stopErr error + stopStarted chan struct{} + stopRelease chan struct{} events []acp.AgentEvent } @@ -1403,6 +1449,21 @@ func (c *recordingSessionCreator) StopWithCause( return err } + c.mu.Lock() + plan, ok := c.bySessionID[id] + c.mu.Unlock() + if !ok { + plan = sessionAttemptPlan{} + } + + notify(plan.stopStarted) + if err := waitForRelease(ctx, plan.stopRelease); err != nil { + return err + } + if plan.stopErr != nil { + return plan.stopErr + } + c.mu.Lock() defer c.mu.Unlock() diff --git a/internal/cli/agent_kernel_test.go b/internal/cli/agent_kernel_test.go index 2427d7a9d..48431c257 100644 --- a/internal/cli/agent_kernel_test.go +++ b/internal/cli/agent_kernel_test.go @@ -16,214 +16,226 @@ import ( func TestMeCommandJSONReturnsValidatedIdentity(t *testing.T) { t.Parallel() - client := &stubClient{} - deps := newAgentCommandTestDeps(t, client) - client.agentMeFn = func(_ context.Context, credentials agentidentity.Credentials) (AgentMeRecord, error) { - assertAgentCredentials(t, credentials) - return AgentMeRecord{ - Self: contract.AgentIdentityPayload{ - SessionID: "sess-agent", - AgentName: "coder", - Provider: "test-provider", - Model: "test-model", - }, - Workspace: contract.AgentWorkspacePayload{ - ID: "ws-1", - RootDir: "/workspace/project", - }, - Session: contract.AgentSessionPayload{ - ID: "sess-agent", - State: session.StateActive, - Channel: "builders", - CreatedAt: fixedTestNow, - UpdatedAt: fixedTestNow, - }, - }, nil - } + t.Run("Should return validated identity as JSON", func(t *testing.T) { + t.Parallel() + + client := &stubClient{} + deps := newAgentCommandTestDeps(t, client) + client.agentMeFn = func(_ context.Context, credentials agentidentity.Credentials) (AgentMeRecord, error) { + assertAgentCredentials(t, credentials) + return AgentMeRecord{ + Self: contract.AgentIdentityPayload{ + SessionID: "sess-agent", + AgentName: "coder", + Provider: "test-provider", + Model: "test-model", + }, + Workspace: contract.AgentWorkspacePayload{ + ID: "ws-1", + RootDir: "/workspace/project", + }, + Session: contract.AgentSessionPayload{ + ID: "sess-agent", + State: session.StateActive, + Channel: "builders", + CreatedAt: fixedTestNow, + UpdatedAt: fixedTestNow, + }, + }, nil + } - stdout, _, err := executeRootCommand(t, deps, "me", "-o", "json") - if err != nil { - t.Fatalf("agh me error = %v", err) - } + stdout, _, err := executeRootCommand(t, deps, "me", "-o", "json") + if err != nil { + t.Fatalf("agh me error = %v", err) + } - var got AgentMeRecord - if err := json.Unmarshal([]byte(stdout), &got); err != nil { - t.Fatalf("json.Unmarshal(agh me) error = %v", err) - } - if got.Self.SessionID != "sess-agent" || got.Self.AgentName != "coder" || got.Workspace.ID != "ws-1" { - t.Fatalf("agh me payload = %#v, want caller session/workspace identity", got) - } + var got AgentMeRecord + if err := json.Unmarshal([]byte(stdout), &got); err != nil { + t.Fatalf("json.Unmarshal(agh me) error = %v", err) + } + if got.Self.SessionID != "sess-agent" || got.Self.AgentName != "coder" || got.Workspace.ID != "ws-1" { + t.Fatalf("agh me payload = %#v, want caller session/workspace identity", got) + } + }) } func TestMeContextCommandJSONKeepsStableSectionOrder(t *testing.T) { t.Parallel() - client := &stubClient{} - deps := newAgentCommandTestDeps(t, client) - client.agentContextFn = func(_ context.Context, credentials agentidentity.Credentials) (AgentContextRecord, error) { - assertAgentCredentials(t, credentials) - return AgentContextRecord{ - Self: contract.AgentIdentityPayload{ - SessionID: "sess-agent", - AgentName: "coder", - Provider: "test-provider", - }, - Workspace: contract.AgentWorkspacePayload{ID: "ws-1", RootDir: "/workspace/project"}, - Session: contract.AgentSessionPayload{ - ID: "sess-agent", - State: session.StateActive, - CreatedAt: fixedTestNow, - UpdatedAt: fixedTestNow, - }, - Task: contract.AgentTaskContextPayload{Available: true}, - CoordinationChannel: contract.AgentCoordinationChannelContextPayload{Available: true}, - InboxSummary: contract.AgentInboxSummaryPayload{}, - PeerRoster: contract.AgentPeerRosterPayload{}, - Capabilities: contract.AgentCapabilitySectionPayload{}, - Limits: contract.AgentLimitsPayload{ContextSectionLimit: 20}, - Provenance: contract.AgentContextProvenancePayload{ - GeneratedAt: fixedTestNow, - Source: "test", - }, - }, nil - } + t.Run("Should keep stable JSON section order", func(t *testing.T) { + t.Parallel() + + client := &stubClient{} + deps := newAgentCommandTestDeps(t, client) + client.agentContextFn = func(_ context.Context, credentials agentidentity.Credentials) (AgentContextRecord, error) { + assertAgentCredentials(t, credentials) + return AgentContextRecord{ + Self: contract.AgentIdentityPayload{ + SessionID: "sess-agent", + AgentName: "coder", + Provider: "test-provider", + }, + Workspace: contract.AgentWorkspacePayload{ID: "ws-1", RootDir: "/workspace/project"}, + Session: contract.AgentSessionPayload{ + ID: "sess-agent", + State: session.StateActive, + CreatedAt: fixedTestNow, + UpdatedAt: fixedTestNow, + }, + Task: contract.AgentTaskContextPayload{Available: true}, + CoordinationChannel: contract.AgentCoordinationChannelContextPayload{Available: true}, + InboxSummary: contract.AgentInboxSummaryPayload{}, + PeerRoster: contract.AgentPeerRosterPayload{}, + Capabilities: contract.AgentCapabilitySectionPayload{}, + Limits: contract.AgentLimitsPayload{ContextSectionLimit: 20}, + Provenance: contract.AgentContextProvenancePayload{ + GeneratedAt: fixedTestNow, + Source: "test", + }, + }, nil + } - stdout, _, err := executeRootCommand(t, deps, "me", "context", "-o", "json") - if err != nil { - t.Fatalf("agh me context error = %v", err) - } + stdout, _, err := executeRootCommand(t, deps, "me", "context", "-o", "json") + if err != nil { + t.Fatalf("agh me context error = %v", err) + } - assertJSONKeyOrder(t, stdout, []string{ - "self", - "workspace", - "session", - "task", - "coordination_channel", - "inbox_summary", - "peer_roster", - "capabilities", - "limits", - "provenance", + assertJSONKeyOrder(t, stdout, []string{ + "self", + "workspace", + "session", + "task", + "coordination_channel", + "inbox_summary", + "peer_roster", + "capabilities", + "limits", + "provenance", + }) }) } func TestSpawnCommandMapsBoundedChildRequest(t *testing.T) { t.Parallel() - var gotRequest AgentSpawnRequest - client := &stubClient{} - deps := newAgentCommandTestDeps(t, client) - client.agentSpawnFn = func( - _ context.Context, - request AgentSpawnRequest, - credentials agentidentity.Credentials, - ) (AgentSpawnRecord, error) { - assertAgentCredentials(t, credentials) - gotRequest = request - ttl := fixedTestNow.Add(2 * time.Minute) - return AgentSpawnRecord{ - Session: SessionRecord{ - ID: "sess-child", - Name: request.Name, - AgentName: request.AgentName, - Provider: request.Provider, - WorkspaceID: "ws-1", - WorkspacePath: "/workspace/project", - Channel: "builders", - Type: session.SessionTypeSpawned, - State: session.StateActive, - CreatedAt: fixedTestNow, - UpdatedAt: fixedTestNow, - }, - Lineage: contract.SessionLineagePayload{ - ParentSessionID: "sess-agent", - RootSessionID: "sess-agent", - SpawnDepth: 1, - SpawnRole: request.SpawnRole, - TTLExpiresAt: &ttl, - AutoStopOnParent: request.AutoStopOnParent, - SpawnBudget: contract.SpawnBudgetPayload{ - MaxChildren: 5, - MaxDepth: 1, - TTLSeconds: request.TTLSeconds, + t.Run("Should map bounded child request", func(t *testing.T) { + t.Parallel() + + var gotRequest AgentSpawnRequest + client := &stubClient{} + deps := newAgentCommandTestDeps(t, client) + client.agentSpawnFn = func( + _ context.Context, + request AgentSpawnRequest, + credentials agentidentity.Credentials, + ) (AgentSpawnRecord, error) { + assertAgentCredentials(t, credentials) + gotRequest = request + ttl := fixedTestNow.Add(2 * time.Minute) + return AgentSpawnRecord{ + Session: SessionRecord{ + ID: "sess-child", + Name: request.Name, + AgentName: request.AgentName, + Provider: request.Provider, + WorkspaceID: "ws-1", + WorkspacePath: "/workspace/project", + Channel: "builders", + Type: session.SessionTypeSpawned, + State: session.StateActive, + CreatedAt: fixedTestNow, + UpdatedAt: fixedTestNow, }, - PermissionPolicy: request.Permissions, - }, - Permissions: request.Permissions, - }, nil - } + Lineage: contract.SessionLineagePayload{ + ParentSessionID: "sess-agent", + RootSessionID: "sess-agent", + SpawnDepth: 1, + SpawnRole: request.SpawnRole, + TTLExpiresAt: &ttl, + AutoStopOnParent: request.AutoStopOnParent, + SpawnBudget: contract.SpawnBudgetPayload{ + MaxChildren: 5, + MaxDepth: 1, + TTLSeconds: request.TTLSeconds, + }, + PermissionPolicy: request.Permissions, + }, + Permissions: request.Permissions, + }, nil + } - stdout, _, err := executeRootCommand( - t, - deps, - "spawn", - "--agent", - "coder", - "--provider", - "codex", - "--model", - "gpt-test", - "--name", - "child", - "--prompt-overlay", - "focus", - "--role", - "worker", - "--ttl-seconds", - "120", - "--tool", - "read", - "--skill", - "go", - "--mcp-server", - "filesystem", - "--workspace-path", - "/workspace/project", - "--channel", - "builders", - "--environment-profile", - "default", - "--idempotency-key", - "spawn-1", - "-o", - "json", - ) - if err != nil { - t.Fatalf("agh spawn error = %v", err) - } - if gotRequest.AgentName != "coder" || - gotRequest.Provider != "codex" || - gotRequest.Model != "gpt-test" || - gotRequest.Name != "child" || - gotRequest.PromptOverlay != "focus" || - gotRequest.SpawnRole != "worker" || - gotRequest.TTLSeconds != 120 || - !gotRequest.AutoStopOnParent || - gotRequest.IdempotencyKey != "spawn-1" { - t.Fatalf("spawn request = %#v, want parsed bounded spawn request", gotRequest) - } - if len(gotRequest.Permissions.Tools) != 1 || - gotRequest.Permissions.Tools[0] != "read" || - len(gotRequest.Permissions.Skills) != 1 || - gotRequest.Permissions.Skills[0] != "go" || - len(gotRequest.Permissions.MCPServers) != 1 || - gotRequest.Permissions.MCPServers[0] != "filesystem" || - len(gotRequest.Permissions.WorkspacePaths) != 1 || - gotRequest.Permissions.WorkspacePaths[0] != "/workspace/project" || - len(gotRequest.Permissions.NetworkChannels) != 1 || - gotRequest.Permissions.NetworkChannels[0] != "builders" || - len(gotRequest.Permissions.EnvironmentProfiles) != 1 || - gotRequest.Permissions.EnvironmentProfiles[0] != "default" { - t.Fatalf("spawn permissions = %#v, want all repeatable atom flags", gotRequest.Permissions) - } + stdout, _, err := executeRootCommand( + t, + deps, + "spawn", + "--agent", + "coder", + "--provider", + "codex", + "--model", + "gpt-test", + "--name", + "child", + "--prompt-overlay", + "focus", + "--role", + "worker", + "--ttl-seconds", + "120", + "--tool", + "read", + "--skill", + "go", + "--mcp-server", + "filesystem", + "--workspace-path", + "/workspace/project", + "--channel", + "builders", + "--environment-profile", + "default", + "--idempotency-key", + "spawn-1", + "-o", + "json", + ) + if err != nil { + t.Fatalf("agh spawn error = %v", err) + } + if gotRequest.AgentName != "coder" || + gotRequest.Provider != "codex" || + gotRequest.Model != "gpt-test" || + gotRequest.Name != "child" || + gotRequest.PromptOverlay != "focus" || + gotRequest.SpawnRole != "worker" || + gotRequest.TTLSeconds != 120 || + !gotRequest.AutoStopOnParent || + gotRequest.IdempotencyKey != "spawn-1" { + t.Fatalf("spawn request = %#v, want parsed bounded spawn request", gotRequest) + } + if len(gotRequest.Permissions.Tools) != 1 || + gotRequest.Permissions.Tools[0] != "read" || + len(gotRequest.Permissions.Skills) != 1 || + gotRequest.Permissions.Skills[0] != "go" || + len(gotRequest.Permissions.MCPServers) != 1 || + gotRequest.Permissions.MCPServers[0] != "filesystem" || + len(gotRequest.Permissions.WorkspacePaths) != 1 || + gotRequest.Permissions.WorkspacePaths[0] != "/workspace/project" || + len(gotRequest.Permissions.NetworkChannels) != 1 || + gotRequest.Permissions.NetworkChannels[0] != "builders" || + len(gotRequest.Permissions.EnvironmentProfiles) != 1 || + gotRequest.Permissions.EnvironmentProfiles[0] != "default" { + t.Fatalf("spawn permissions = %#v, want all repeatable atom flags", gotRequest.Permissions) + } - var output AgentSpawnRecord - if err := json.Unmarshal([]byte(stdout), &output); err != nil { - t.Fatalf("json.Unmarshal(spawn output) error = %v", err) - } - if output.Session.ID != "sess-child" || output.Lineage.ParentSessionID != "sess-agent" { - t.Fatalf("spawn output = %#v, want child session with parent lineage", output) - } + var output AgentSpawnRecord + if err := json.Unmarshal([]byte(stdout), &output); err != nil { + t.Fatalf("json.Unmarshal(spawn output) error = %v", err) + } + if output.Session.ID != "sess-child" || output.Lineage.ParentSessionID != "sess-agent" { + t.Fatalf("spawn output = %#v, want child session with parent lineage", output) + } + }) } func TestChannelSendRejectsMissingInputsAndInvalidIdentity(t *testing.T) { @@ -235,17 +247,17 @@ func TestChannelSendRejectsMissingInputsAndInvalidIdentity(t *testing.T) { args []string }{ { - name: "missing channel", + name: "Should reject missing channel", deps: newAgentCommandTestDeps, args: []string{"ch", "send", "--body", `{"text":"ok"}`, "--task-id", "task-1", "--run-id", "run-1"}, }, { - name: "missing body", + name: "Should reject missing body", deps: newAgentCommandTestDeps, args: []string{"ch", "send", "builders", "--task-id", "task-1", "--run-id", "run-1"}, }, { - name: "invalid caller identity", + name: "Should reject invalid caller identity", deps: newMissingAgentIdentityDeps, args: []string{ "ch", "send", "builders", @@ -286,21 +298,21 @@ func TestAgentCommandsRejectMissingIdentityBeforeAgentCalls(t *testing.T) { name string args []string }{ - {name: "me", args: []string{"me", "-o", "json"}}, - {name: "me context", args: []string{"me", "context", "-o", "json"}}, - {name: "ch list", args: []string{"ch", "list", "-o", "json"}}, - {name: "ch recv", args: []string{"ch", "recv", "builders", "-o", "json"}}, - {name: "task next", args: []string{"task", "next", "-o", "json"}}, + {name: "Should reject me without identity", args: []string{"me", "-o", "json"}}, + {name: "Should reject me context without identity", args: []string{"me", "context", "-o", "json"}}, + {name: "Should reject ch list without identity", args: []string{"ch", "list", "-o", "json"}}, + {name: "Should reject ch recv without identity", args: []string{"ch", "recv", "builders", "-o", "json"}}, + {name: "Should reject task next without identity", args: []string{"task", "next", "-o", "json"}}, { - name: "task heartbeat", + name: "Should reject task heartbeat without identity", args: []string{"task", "heartbeat", "run-1", "--claim-token", "agh_claim_token", "-o", "json"}, }, { - name: "task complete", + name: "Should reject task complete without identity", args: []string{"task", "complete", "run-1", "--claim-token", "agh_claim_token", "-o", "json"}, }, { - name: "task fail", + name: "Should reject task fail without identity", args: []string{ "task", "fail", @@ -314,7 +326,7 @@ func TestAgentCommandsRejectMissingIdentityBeforeAgentCalls(t *testing.T) { }, }, { - name: "task release", + name: "Should reject task release without identity", args: []string{"task", "release", "run-1", "--claim-token", "agh_claim_token", "-o", "json"}, }, } @@ -334,389 +346,433 @@ func TestAgentCommandsRejectMissingIdentityBeforeAgentCalls(t *testing.T) { func TestChannelListCommandJSONReturnsVisibleChannels(t *testing.T) { t.Parallel() - client := &stubClient{} - deps := newAgentCommandTestDeps(t, client) - client.agentChannelsFn = func(_ context.Context, credentials agentidentity.Credentials) ([]AgentChannelRecord, error) { - assertAgentCredentials(t, credentials) - return []AgentChannelRecord{{ - ID: "builders", - Channel: "builders", - DisplayName: "builders", - Purpose: "task_coordination", - WorkspaceID: "ws-1", - AllowedMessageKinds: contract.CoordinationMessageKinds(), - }}, nil - } + t.Run("Should return visible channels as JSON", func(t *testing.T) { + t.Parallel() + + client := &stubClient{} + deps := newAgentCommandTestDeps(t, client) + client.agentChannelsFn = func(_ context.Context, credentials agentidentity.Credentials) ([]AgentChannelRecord, error) { + assertAgentCredentials(t, credentials) + return []AgentChannelRecord{{ + ID: "builders", + Channel: "builders", + DisplayName: "builders", + Purpose: "task_coordination", + WorkspaceID: "ws-1", + AllowedMessageKinds: contract.CoordinationMessageKinds(), + }}, nil + } - stdout, _, err := executeRootCommand(t, deps, "ch", "list", "-o", "json") - if err != nil { - t.Fatalf("agh ch list error = %v", err) - } + stdout, _, err := executeRootCommand(t, deps, "ch", "list", "-o", "json") + if err != nil { + t.Fatalf("agh ch list error = %v", err) + } - var channels []AgentChannelRecord - if err := json.Unmarshal([]byte(stdout), &channels); err != nil { - t.Fatalf("json.Unmarshal(channels) error = %v", err) - } - if len(channels) != 1 || - channels[0].ID != "builders" || - len(channels[0].AllowedMessageKinds) != len(contract.CoordinationMessageKinds()) { - t.Fatalf("channels = %#v, want builders with MVP message kinds", channels) - } + var channels []AgentChannelRecord + if err := json.Unmarshal([]byte(stdout), &channels); err != nil { + t.Fatalf("json.Unmarshal(channels) error = %v", err) + } + if len(channels) != 1 || + channels[0].ID != "builders" || + len(channels[0].AllowedMessageKinds) != len(contract.CoordinationMessageKinds()) { + t.Fatalf("channels = %#v, want builders with MVP message kinds", channels) + } + }) } func TestChannelSendPreservesCoordinationMetadataAndRejectsClaimToken(t *testing.T) { t.Parallel() - for _, kind := range []contract.CoordinationMessageKind{ - contract.CoordinationMessageStatus, - contract.CoordinationMessageBlocker, - contract.CoordinationMessageResult, - } { - t.Run(string(kind), func(t *testing.T) { - t.Parallel() - - client := &stubClient{} - deps := newAgentCommandTestDeps(t, client) - client.agentChannelSendFn = func( - _ context.Context, - channel string, - request AgentChannelSendRequest, - credentials agentidentity.Credentials, - ) (AgentChannelMessageRecord, error) { - assertAgentCredentials(t, credentials) - if channel != "builders" { - t.Fatalf("channel = %q, want builders", channel) - } - if request.Metadata.TaskID != "task-1" || - request.Metadata.RunID != "run-1" || - request.Metadata.WorkflowID != "wf-1" || - request.Metadata.CoordinationChannelID != "builders" || - request.Metadata.CorrelationID != "corr-1" || - request.Metadata.MessageKind != kind { - t.Fatalf("metadata = %#v, want task/run/%s correlation", request.Metadata, kind) - } - if string(request.Metadata.Ext["note"]) != `"safe"` { - t.Fatalf("metadata.Ext = %#v, want note", request.Metadata.Ext) - } - if request.IdempotencyKey != "idem-1" { - t.Fatalf("idempotency key = %q, want idem-1", request.IdempotencyKey) + t.Run("Should preserve coordination metadata and reject raw claim tokens", func(t *testing.T) { + t.Parallel() + + for _, kind := range []contract.CoordinationMessageKind{ + contract.CoordinationMessageStatus, + contract.CoordinationMessageBlocker, + contract.CoordinationMessageResult, + } { + t.Run("Should preserve "+string(kind)+" coordination metadata", func(t *testing.T) { + t.Parallel() + + client := &stubClient{} + deps := newAgentCommandTestDeps(t, client) + client.agentChannelSendFn = func( + _ context.Context, + channel string, + request AgentChannelSendRequest, + credentials agentidentity.Credentials, + ) (AgentChannelMessageRecord, error) { + assertAgentCredentials(t, credentials) + if channel != "builders" { + t.Fatalf("channel = %q, want builders", channel) + } + if request.Metadata.TaskID != "task-1" || + request.Metadata.RunID != "run-1" || + request.Metadata.WorkflowID != "wf-1" || + request.Metadata.CoordinationChannelID != "builders" || + request.Metadata.CorrelationID != "corr-1" || + request.Metadata.MessageKind != kind { + t.Fatalf("metadata = %#v, want task/run/%s correlation", request.Metadata, kind) + } + if string(request.Metadata.Ext["note"]) != `"safe"` { + t.Fatalf("metadata.Ext = %#v, want note", request.Metadata.Ext) + } + if request.IdempotencyKey != "idem-1" { + t.Fatalf("idempotency key = %q, want idem-1", request.IdempotencyKey) + } + return AgentChannelMessageRecord{ + MessageID: "msg-1", + ChannelID: "builders", + Body: request.Body, + Metadata: request.Metadata, + Timestamp: fixedTestNow, + }, nil } - return AgentChannelMessageRecord{ - MessageID: "msg-1", - ChannelID: "builders", - Body: request.Body, - Metadata: request.Metadata, - Timestamp: fixedTestNow, - }, nil - } - _, _, err := executeRootCommand( - t, - deps, - "ch", "send", "builders", - "--body", `{"text":"ok"}`, - "--task-id", "task-1", - "--run-id", "run-1", - "--workflow-id", "wf-1", - "--kind", string(kind), - "--correlation-id", "corr-1", - "--metadata-ext", `{"note":"safe"}`, - "--idempotency-key", "idem-1", - "-o", "json", - ) - if err != nil { - t.Fatalf("agh ch send error = %v", err) - } - }) - } + _, _, err := executeRootCommand( + t, + deps, + "ch", "send", "builders", + "--body", `{"text":"ok"}`, + "--task-id", "task-1", + "--run-id", "run-1", + "--workflow-id", "wf-1", + "--kind", string(kind), + "--correlation-id", "corr-1", + "--metadata-ext", `{"note":"safe"}`, + "--idempotency-key", "idem-1", + "-o", "json", + ) + if err != nil { + t.Fatalf("agh ch send error = %v", err) + } + }) + } - for _, tt := range []struct { - name string - args []string - }{ - { - name: "body", - args: []string{ - "ch", "send", "builders", - "--body", `{"claim_token":"secret"}`, - "--task-id", "task-1", - "--run-id", "run-1", - }, - }, - { - name: "metadata ext", - args: []string{ - "ch", "send", "builders", - "--body", `{"text":"ok"}`, - "--task-id", "task-1", - "--run-id", "run-1", - "--metadata-ext", `{"claim_token":"secret"}`, + for _, tt := range []struct { + name string + args []string + }{ + { + name: "Should reject raw claim token in body", + args: []string{ + "ch", "send", "builders", + "--body", `{"claim_token":"secret"}`, + "--task-id", "task-1", + "--run-id", "run-1", + }, }, - }, - } { - t.Run("reject raw claim token in "+tt.name, func(t *testing.T) { - t.Parallel() - - client := &stubClient{ - agentChannelSendFn: func( - context.Context, - string, - AgentChannelSendRequest, - agentidentity.Credentials, - ) (AgentChannelMessageRecord, error) { - t.Fatal("AgentChannelSend should not be called when claim_token is present") - return AgentChannelMessageRecord{}, errors.New("unexpected") + { + name: "Should reject raw claim token in metadata ext", + args: []string{ + "ch", "send", "builders", + "--body", `{"text":"ok"}`, + "--task-id", "task-1", + "--run-id", "run-1", + "--metadata-ext", `{"claim_token":"secret"}`, }, - } - _, _, err := executeRootCommand(t, newAgentCommandTestDeps(t, client), tt.args...) - if !errors.Is(err, contract.ErrRawClaimTokenMetadata) { - t.Fatalf("agh ch send error = %v, want ErrRawClaimTokenMetadata", err) - } - }) - } + }, + } { + t.Run(tt.name, func(t *testing.T) { + t.Parallel() + + client := &stubClient{ + agentChannelSendFn: func( + context.Context, + string, + AgentChannelSendRequest, + agentidentity.Credentials, + ) (AgentChannelMessageRecord, error) { + t.Fatal("AgentChannelSend should not be called when claim_token is present") + return AgentChannelMessageRecord{}, errors.New("unexpected") + }, + } + _, _, err := executeRootCommand(t, newAgentCommandTestDeps(t, client), tt.args...) + if !errors.Is(err, contract.ErrRawClaimTokenMetadata) { + t.Fatalf("agh ch send error = %v, want ErrRawClaimTokenMetadata", err) + } + }) + } + }) } func TestChannelReplySendsOnlyMessageIDAndBodyWhenMetadataIsResolvedServerSide(t *testing.T) { t.Parallel() - client := &stubClient{} - deps := newAgentCommandTestDeps(t, client) - client.agentChannelReplyFn = func( - _ context.Context, - request AgentChannelReplyRequest, - credentials agentidentity.Credentials, - ) (AgentChannelMessageRecord, error) { - assertAgentCredentials(t, credentials) - if request.ReplyToMessageID != "msg-source" { - t.Fatalf("reply_to_message_id = %q, want msg-source", request.ReplyToMessageID) - } - if string(request.Body) != `{"text":"ack"}` { - t.Fatalf("body = %s, want ack JSON", request.Body) - } - if !zeroCLICoordinationMetadata(request.Metadata) { - t.Fatalf("metadata = %#v, want zero metadata for server-side source resolution", request.Metadata) - } - return AgentChannelMessageRecord{ - MessageID: "msg-reply", - ChannelID: "builders", - Body: request.Body, - Metadata: contract.CoordinationMessageMetadataPayload{ - TaskID: "task-1", - RunID: "run-1", - CoordinationChannelID: "builders", - MessageKind: contract.CoordinationMessageReply, - CorrelationID: "run-1", - }, - Timestamp: fixedTestNow, - }, nil - } + t.Run("Should send only message ID and body when metadata is server-resolved", func(t *testing.T) { + t.Parallel() + + client := &stubClient{} + deps := newAgentCommandTestDeps(t, client) + client.agentChannelReplyFn = func( + _ context.Context, + request AgentChannelReplyRequest, + credentials agentidentity.Credentials, + ) (AgentChannelMessageRecord, error) { + assertAgentCredentials(t, credentials) + if request.ReplyToMessageID != "msg-source" { + t.Fatalf("reply_to_message_id = %q, want msg-source", request.ReplyToMessageID) + } + if string(request.Body) != `{"text":"ack"}` { + t.Fatalf("body = %s, want ack JSON", request.Body) + } + if !zeroCLICoordinationMetadata(request.Metadata) { + t.Fatalf("metadata = %#v, want zero metadata for server-side source resolution", request.Metadata) + } + return AgentChannelMessageRecord{ + MessageID: "msg-reply", + ChannelID: "builders", + Body: request.Body, + Metadata: contract.CoordinationMessageMetadataPayload{ + TaskID: "task-1", + RunID: "run-1", + CoordinationChannelID: "builders", + MessageKind: contract.CoordinationMessageReply, + CorrelationID: "run-1", + }, + Timestamp: fixedTestNow, + }, nil + } - if _, _, err := executeRootCommand( - t, - deps, - "ch", "reply", - "--to-message", "msg-source", - "--body", `{"text":"ack"}`, - "-o", "json", - ); err != nil { - t.Fatalf("agh ch reply error = %v", err) - } + if _, _, err := executeRootCommand( + t, + deps, + "ch", "reply", + "--to-message", "msg-source", + "--body", `{"text":"ack"}`, + "-o", "json", + ); err != nil { + t.Fatalf("agh ch reply error = %v", err) + } - _, _, err := executeRootCommand( - t, - deps, - "ch", "reply", - "--to-message", "msg-source", - "--body", `{"text":"ack"}`, - "--kind", "status", - ) - if err == nil || !strings.Contains(err.Error(), "--kind must be reply") { - t.Fatalf("agh ch reply --kind status error = %v, want reply-kind validation", err) - } + _, _, err := executeRootCommand( + t, + deps, + "ch", "reply", + "--to-message", "msg-source", + "--body", `{"text":"ack"}`, + "--kind", "status", + ) + if err == nil || !strings.Contains(err.Error(), "--kind must be reply") { + t.Fatalf("agh ch reply --kind status error = %v, want reply-kind validation", err) + } - _, _, err = executeRootCommand( - t, - deps, - "ch", "reply", - "--to-message", "msg-source", - "--body", `{"text":"ack"}`, - "--task-id", "task-1", - "--run-id", "run-1", - "--coordination-channel-id", "builders", - "--kind", "status", - ) - if err == nil || !strings.Contains(err.Error(), "--kind must be reply") { - t.Fatalf("agh ch reply --kind status error = %v, want reply-kind validation", err) - } + _, _, err = executeRootCommand( + t, + deps, + "ch", "reply", + "--to-message", "msg-source", + "--body", `{"text":"ack"}`, + "--task-id", "task-1", + "--run-id", "run-1", + "--coordination-channel-id", "builders", + "--kind", "status", + ) + if err == nil || !strings.Contains(err.Error(), "--kind must be reply") { + t.Fatalf("agh ch reply --kind status error = %v, want reply-kind validation", err) + } + }) } func TestChannelRecvJSONLOutputEmitsOneObjectPerMessage(t *testing.T) { t.Parallel() - client := &stubClient{} - deps := newAgentCommandTestDeps(t, client) - client.agentChannelRecvFn = func( - _ context.Context, - channel string, - query AgentChannelRecvQuery, - credentials agentidentity.Credentials, - ) ([]AgentChannelMessageRecord, error) { - assertAgentCredentials(t, credentials) - if channel != "builders" || !query.Wait || query.Limit != 2 { - t.Fatalf("recv channel/query = %q/%#v, want builders wait limit=2", channel, query) - } - return []AgentChannelMessageRecord{ - agentChannelTestMessage("msg-1", contract.CoordinationMessageStatus), - agentChannelTestMessage("msg-2", contract.CoordinationMessageResult), - }, nil - } + t.Run("Should emit one JSONL object per message", func(t *testing.T) { + t.Parallel() + + client := &stubClient{} + deps := newAgentCommandTestDeps(t, client) + client.agentChannelRecvFn = func( + _ context.Context, + channel string, + query AgentChannelRecvQuery, + credentials agentidentity.Credentials, + ) ([]AgentChannelMessageRecord, error) { + assertAgentCredentials(t, credentials) + if channel != "builders" || !query.Wait || query.Limit != 2 { + t.Fatalf("recv channel/query = %q/%#v, want builders wait limit=2", channel, query) + } + return []AgentChannelMessageRecord{ + agentChannelTestMessage("msg-1", contract.CoordinationMessageStatus), + agentChannelTestMessage("msg-2", contract.CoordinationMessageResult), + }, nil + } - stdout, _, err := executeRootCommand( - t, - deps, - "ch", "recv", "builders", - "--wait", - "--limit", "2", - "-o", "jsonl", - ) - if err != nil { - t.Fatalf("agh ch recv error = %v", err) - } + stdout, _, err := executeRootCommand( + t, + deps, + "ch", "recv", "builders", + "--wait", + "--limit", "2", + "-o", "jsonl", + ) + if err != nil { + t.Fatalf("agh ch recv error = %v", err) + } - lines := strings.Split(strings.TrimSpace(stdout), "\n") - if len(lines) != 2 { - t.Fatalf("jsonl line count = %d, want 2; output=%q", len(lines), stdout) - } - for index, line := range lines { - var message AgentChannelMessageRecord - if err := json.Unmarshal([]byte(line), &message); err != nil { - t.Fatalf("json.Unmarshal(line %d) error = %v", index, err) + lines := strings.Split(strings.TrimSpace(stdout), "\n") + if len(lines) != 2 { + t.Fatalf("jsonl line count = %d, want 2; output=%q", len(lines), stdout) } - if message.MessageID == "" || message.Metadata.MessageKind == "" { - t.Fatalf("message line %d = %#v, want populated message", index, message) + for index, line := range lines { + var message AgentChannelMessageRecord + if err := json.Unmarshal([]byte(line), &message); err != nil { + t.Fatalf("json.Unmarshal(line %d) error = %v", index, err) + } + if message.MessageID == "" || message.Metadata.MessageKind == "" { + t.Fatalf("message line %d = %#v, want populated message", index, message) + } } - } + }) } func TestAgentCommandsRenderHumanAndToonOutputs(t *testing.T) { t.Parallel() - client := &stubClient{} - deps := newAgentCommandTestDeps(t, client) - meRecord := AgentMeRecord{ - Self: contract.AgentIdentityPayload{ - SessionID: "sess-agent", - AgentName: "coder", - Provider: "test-provider", - Model: "test-model", - }, - Workspace: contract.AgentWorkspacePayload{ID: "ws-1", RootDir: "/workspace/project"}, - Session: contract.AgentSessionPayload{ - ID: "sess-agent", - State: session.StateActive, - CreatedAt: fixedTestNow, - UpdatedAt: fixedTestNow, - }, - } - contextRecord := AgentContextRecord{ - Self: meRecord.Self, - Workspace: meRecord.Workspace, - Session: meRecord.Session, - Provenance: contract.AgentContextProvenancePayload{ - GeneratedAt: fixedTestNow, - Source: "test", - }, - } - channelRecord := AgentChannelRecord{ - ID: "builders", - Channel: "builders", - DisplayName: "builders", - Purpose: "task_coordination", - WorkspaceID: "ws-1", - AllowedMessageKinds: contract.CoordinationMessageKinds(), - } - statusMessage := agentChannelTestMessage("msg-1", contract.CoordinationMessageStatus) - replyMessage := agentChannelTestMessage("msg-reply", contract.CoordinationMessageReply) + t.Run("Should render human and toon outputs", func(t *testing.T) { + t.Parallel() - client.agentMeFn = func(context.Context, agentidentity.Credentials) (AgentMeRecord, error) { - return meRecord, nil - } - client.agentContextFn = func(context.Context, agentidentity.Credentials) (AgentContextRecord, error) { - return contextRecord, nil - } - client.agentChannelsFn = func(context.Context, agentidentity.Credentials) ([]AgentChannelRecord, error) { - return []AgentChannelRecord{channelRecord}, nil - } - client.agentChannelRecvFn = func( - context.Context, - string, - AgentChannelRecvQuery, - agentidentity.Credentials, - ) ([]AgentChannelMessageRecord, error) { - return []AgentChannelMessageRecord{statusMessage}, nil - } - client.agentChannelSendFn = func( - context.Context, - string, - AgentChannelSendRequest, - agentidentity.Credentials, - ) (AgentChannelMessageRecord, error) { - return statusMessage, nil - } - client.agentChannelReplyFn = func( - context.Context, - AgentChannelReplyRequest, - agentidentity.Credentials, - ) (AgentChannelMessageRecord, error) { - return replyMessage, nil - } + client := &stubClient{} + deps := newAgentCommandTestDeps(t, client) + meRecord := AgentMeRecord{ + Self: contract.AgentIdentityPayload{ + SessionID: "sess-agent", + AgentName: "coder", + Provider: "test-provider", + Model: "test-model", + }, + Workspace: contract.AgentWorkspacePayload{ID: "ws-1", RootDir: "/workspace/project"}, + Session: contract.AgentSessionPayload{ + ID: "sess-agent", + State: session.StateActive, + CreatedAt: fixedTestNow, + UpdatedAt: fixedTestNow, + }, + } + contextRecord := AgentContextRecord{ + Self: meRecord.Self, + Workspace: meRecord.Workspace, + Session: meRecord.Session, + Provenance: contract.AgentContextProvenancePayload{ + GeneratedAt: fixedTestNow, + Source: "test", + }, + } + channelRecord := AgentChannelRecord{ + ID: "builders", + Channel: "builders", + DisplayName: "builders", + Purpose: "task_coordination", + WorkspaceID: "ws-1", + AllowedMessageKinds: contract.CoordinationMessageKinds(), + } + statusMessage := agentChannelTestMessage("msg-1", contract.CoordinationMessageStatus) + replyMessage := agentChannelTestMessage("msg-reply", contract.CoordinationMessageReply) - tests := []struct { - name string - args []string - want string - }{ - {name: "me human", args: []string{"me", "-o", "human"}, want: "Agent"}, - {name: "me toon", args: []string{"me", "-o", "toon"}, want: "agent_me{session_id"}, - {name: "context human", args: []string{"me", "context", "-o", "human"}, want: `"source": "test"`}, - {name: "context toon", args: []string{"me", "context", "-o", "toon"}, want: `"source": "test"`}, - {name: "channels human", args: []string{"ch", "list", "-o", "human"}, want: "Agent Channels"}, - {name: "channels toon", args: []string{"ch", "list", "-o", "toon"}, want: "agent_channels[1]"}, - {name: "recv human", args: []string{"ch", "recv", "builders", "-o", "human"}, want: "Agent Channel Messages"}, - {name: "recv toon", args: []string{"ch", "recv", "builders", "-o", "toon"}, want: "agent_channel_messages[1]"}, - { - name: "send human", - args: []string{ - "ch", "send", "builders", - "--body", `{"text":"ok"}`, - "--task-id", "task-1", - "--run-id", "run-1", - "-o", "human", + client.agentMeFn = func(context.Context, agentidentity.Credentials) (AgentMeRecord, error) { + return meRecord, nil + } + client.agentContextFn = func(context.Context, agentidentity.Credentials) (AgentContextRecord, error) { + return contextRecord, nil + } + client.agentChannelsFn = func(context.Context, agentidentity.Credentials) ([]AgentChannelRecord, error) { + return []AgentChannelRecord{channelRecord}, nil + } + client.agentChannelRecvFn = func( + context.Context, + string, + AgentChannelRecvQuery, + agentidentity.Credentials, + ) ([]AgentChannelMessageRecord, error) { + return []AgentChannelMessageRecord{statusMessage}, nil + } + client.agentChannelSendFn = func( + context.Context, + string, + AgentChannelSendRequest, + agentidentity.Credentials, + ) (AgentChannelMessageRecord, error) { + return statusMessage, nil + } + client.agentChannelReplyFn = func( + context.Context, + AgentChannelReplyRequest, + agentidentity.Credentials, + ) (AgentChannelMessageRecord, error) { + return replyMessage, nil + } + + tests := []struct { + name string + args []string + want string + }{ + {name: "Should render me human output", args: []string{"me", "-o", "human"}, want: "Agent"}, + {name: "Should render me toon output", args: []string{"me", "-o", "toon"}, want: "agent_me{session_id"}, + { + name: "Should render context human output", + args: []string{"me", "context", "-o", "human"}, + want: `"source": "test"`, }, - want: "Agent Channel Message", - }, - { - name: "reply toon", - args: []string{ - "ch", "reply", - "--to-message", "msg-1", - "--body", `{"text":"ack"}`, - "-o", "toon", + { + name: "Should render context toon output", + args: []string{"me", "context", "-o", "toon"}, + want: `"source": "test"`, }, - want: "agent_channel_message{message_id", - }, - } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - stdout, _, err := executeRootCommand(t, deps, tt.args...) - if err != nil { - t.Fatalf("executeRootCommand(%v) error = %v", tt.args, err) - } - if !strings.Contains(stdout, tt.want) { - t.Fatalf("output = %q, want substring %q", stdout, tt.want) - } - }) - } + { + name: "Should render channels human output", + args: []string{"ch", "list", "-o", "human"}, + want: "Agent Channels", + }, + { + name: "Should render channels toon output", + args: []string{"ch", "list", "-o", "toon"}, + want: "agent_channels[1]", + }, + { + name: "Should render recv human output", + args: []string{"ch", "recv", "builders", "-o", "human"}, + want: "Agent Channel Messages", + }, + { + name: "Should render recv toon output", + args: []string{"ch", "recv", "builders", "-o", "toon"}, + want: "agent_channel_messages[1]", + }, + { + name: "Should render send human output", + args: []string{ + "ch", "send", "builders", + "--body", `{"text":"ok"}`, + "--task-id", "task-1", + "--run-id", "run-1", + "-o", "human", + }, + want: "Agent Channel Message", + }, + { + name: "Should render reply toon output", + args: []string{ + "ch", "reply", + "--to-message", "msg-1", + "--body", `{"text":"ack"}`, + "-o", "toon", + }, + want: "agent_channel_message{message_id", + }, + } + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + stdout, _, err := executeRootCommand(t, deps, tt.args...) + if err != nil { + t.Fatalf("executeRootCommand(%v) error = %v", tt.args, err) + } + if !strings.Contains(stdout, tt.want) { + t.Fatalf("output = %q, want substring %q", stdout, tt.want) + } + }) + } + }) } func newAgentCommandTestDeps(t *testing.T, client *stubClient) commandDeps { diff --git a/internal/cli/cli_integration_test.go b/internal/cli/cli_integration_test.go index c3854b1d1..5f7944eed 100644 --- a/internal/cli/cli_integration_test.go +++ b/internal/cli/cli_integration_test.go @@ -1665,7 +1665,6 @@ func TestCLIAgentTaskLeaseLifecycleIntegration(t *testing.T) { }, }, } { - tt := tt t.Run("Should reject stale "+tt.name+" after recovery", func(t *testing.T) { _, _, err := executeRootCommand(t, agentDeps, tt.args...) if err == nil { diff --git a/internal/cli/client_test.go b/internal/cli/client_test.go index b77a6d003..e1058cefb 100644 --- a/internal/cli/client_test.go +++ b/internal/cli/client_test.go @@ -29,41 +29,45 @@ func (f roundTripperFunc) RoundTrip(req *http.Request) (*http.Response, error) { func TestUnixSocketClientAgentMeSendsIdentityHeaders(t *testing.T) { t.Parallel() - client := &unixSocketClient{ - socketPath: "/tmp/agh.sock", - httpClient: &http.Client{ - Transport: roundTripperFunc(func(req *http.Request) (*http.Response, error) { - if req.Method != http.MethodGet || req.URL.Path != "/api/agent/me" { - t.Fatalf("request = %s %s, want GET /api/agent/me", req.Method, req.URL.Path) - } - if got := req.Header.Get(agentidentity.HeaderSessionID); got != "sess-1" { - t.Fatalf("%s = %q, want sess-1", agentidentity.HeaderSessionID, got) - } - if got := req.Header.Get(agentidentity.HeaderAgent); got != "coder" { - t.Fatalf("%s = %q, want coder", agentidentity.HeaderAgent, got) - } - if got := req.Header.Get(agentidentity.HeaderWorkspaceID); got != "ws-1" { - t.Fatalf("%s = %q, want ws-1", agentidentity.HeaderWorkspaceID, got) - } - return newHTTPResponse( - http.StatusOK, - `{"me":{"self":{"session_id":"sess-1","agent_name":"coder","provider":"test"},"workspace":{"id":"ws-1"},"session":{"id":"sess-1","agent_name":"coder","state":"active","created_at":"2026-04-03T12:00:00Z","updated_at":"2026-04-03T12:00:00Z"},"capabilities":[],"channels":[],"active_task_leases":[]}}`, - ), nil - }), - }, - } + t.Run("Should send identity headers", func(t *testing.T) { + t.Parallel() - me, err := client.AgentMe(context.Background(), agentidentity.Credentials{ - SessionID: "sess-1", - AgentName: "coder", - WorkspaceID: "ws-1", + client := &unixSocketClient{ + socketPath: "/tmp/agh.sock", + httpClient: &http.Client{ + Transport: roundTripperFunc(func(req *http.Request) (*http.Response, error) { + if req.Method != http.MethodGet || req.URL.Path != "/api/agent/me" { + t.Fatalf("request = %s %s, want GET /api/agent/me", req.Method, req.URL.Path) + } + if got := req.Header.Get(agentidentity.HeaderSessionID); got != "sess-1" { + t.Fatalf("%s = %q, want sess-1", agentidentity.HeaderSessionID, got) + } + if got := req.Header.Get(agentidentity.HeaderAgent); got != "coder" { + t.Fatalf("%s = %q, want coder", agentidentity.HeaderAgent, got) + } + if got := req.Header.Get(agentidentity.HeaderWorkspaceID); got != "ws-1" { + t.Fatalf("%s = %q, want ws-1", agentidentity.HeaderWorkspaceID, got) + } + return newHTTPResponse( + http.StatusOK, + `{"me":{"self":{"session_id":"sess-1","agent_name":"coder","provider":"test"},"workspace":{"id":"ws-1"},"session":{"id":"sess-1","agent_name":"coder","state":"active","created_at":"2026-04-03T12:00:00Z","updated_at":"2026-04-03T12:00:00Z"},"capabilities":[],"channels":[],"active_task_leases":[]}}`, + ), nil + }), + }, + } + + me, err := client.AgentMe(context.Background(), agentidentity.Credentials{ + SessionID: "sess-1", + AgentName: "coder", + WorkspaceID: "ws-1", + }) + if err != nil { + t.Fatalf("AgentMe() error = %v", err) + } + if me.Self.SessionID != "sess-1" || me.Self.AgentName != "coder" { + t.Fatalf("AgentMe() = %#v, want validated response", me.Self) + } }) - if err != nil { - t.Fatalf("AgentMe() error = %v", err) - } - if me.Self.SessionID != "sess-1" || me.Self.AgentName != "coder" { - t.Fatalf("AgentMe() = %#v, want validated response", me.Self) - } } func TestUnixSocketClientAgentChannelMethodsSendIdentityHeaders(t *testing.T) { @@ -401,30 +405,34 @@ func TestUnixSocketClientAgentTaskMethods(t *testing.T) { func TestUnixSocketClientAgentTaskErrorsRedactClaimTokens(t *testing.T) { t.Parallel() - rawToken := "agh_claim_CLIENTERRORTOKEN123" - client := &unixSocketClient{ - socketPath: "/tmp/agh.sock", - httpClient: &http.Client{ - Transport: roundTripperFunc(func(*http.Request) (*http.Response, error) { - return newHTTPResponse( - http.StatusConflict, - `{"error":"task: invalid claim token: `+rawToken+`"}`, - ), nil - }), - }, - } - _, err := client.AgentTaskRelease( - context.Background(), - "run-1", - AgentTaskReleaseRequest{ClaimToken: rawToken}, - agentidentity.Credentials{SessionID: "sess-1", AgentName: "coder"}, - ) - if err == nil { - t.Fatal("AgentTaskRelease() error = nil, want redacted API error") - } - if strings.Contains(err.Error(), rawToken) || !strings.Contains(err.Error(), "agh_claim_[REDACTED]") { - t.Fatalf("error = %q, want redacted claim token", err.Error()) - } + t.Run("Should redact claim tokens from task errors", func(t *testing.T) { + t.Parallel() + + rawToken := "agh_claim_CLIENTERRORTOKEN123" + client := &unixSocketClient{ + socketPath: "/tmp/agh.sock", + httpClient: &http.Client{ + Transport: roundTripperFunc(func(*http.Request) (*http.Response, error) { + return newHTTPResponse( + http.StatusConflict, + `{"error":"task: invalid claim token: `+rawToken+`"}`, + ), nil + }), + }, + } + _, err := client.AgentTaskRelease( + context.Background(), + "run-1", + AgentTaskReleaseRequest{ClaimToken: rawToken}, + agentidentity.Credentials{SessionID: "sess-1", AgentName: "coder"}, + ) + if err == nil { + t.Fatal("AgentTaskRelease() error = nil, want redacted API error") + } + if strings.Contains(err.Error(), rawToken) || !strings.Contains(err.Error(), "agh_claim_[REDACTED]") { + t.Fatalf("error = %q, want redacted claim token", err.Error()) + } + }) } func agentTaskLeaseHTTPResponse(status taskpkg.RunStatus) *http.Response { diff --git a/internal/daemon/daemon.go b/internal/daemon/daemon.go index 4c00b3c30..d9601aa16 100644 --- a/internal/daemon/daemon.go +++ b/internal/daemon/daemon.go @@ -910,6 +910,7 @@ func (d *Daemon) applyServerFactoryDefaults() { udsapi.WithWorkspaceResolver(deps.WorkspaceService), udsapi.WithAgentCatalog(deps.AgentCatalog), udsapi.WithAgentContext(deps.AgentContext), + udsapi.WithCoordinatorConfig(deps.CoordinatorConfig), udsapi.WithSkillsRegistry(deps.SkillsRegistry), udsapi.WithMemoryStore(deps.MemoryStore), udsapi.WithDreamTrigger(deps.DreamTrigger), diff --git a/internal/daemon/daemon_integration_test.go b/internal/daemon/daemon_integration_test.go index 75abd02c6..ea9a0bd32 100644 --- a/internal/daemon/daemon_integration_test.go +++ b/internal/daemon/daemon_integration_test.go @@ -2374,6 +2374,116 @@ body assertLifecycleHookPayload(t, agentOutput, hookspkg.HookSessionPostStop, resolvedWorkspace) } +func TestBootRunsWorkspaceTaskRunHookWithRelativeScriptPath(t *testing.T) { + t.Run("Should run workspace task-run hook with relative script path", func(t *testing.T) { + homePaths := integrationHomePaths(t) + cfg := testConfig(t, homePaths) + cfg.Memory.Enabled = false + cfg.Skills.Enabled = false + + workspaceRoot := filepath.Join(t.TempDir(), "workspace") + if err := os.MkdirAll(filepath.Join(workspaceRoot, aghconfig.DirName, "hooks"), 0o755); err != nil { + t.Fatalf( + "os.MkdirAll(%q) error = %v", + filepath.Join(workspaceRoot, aghconfig.DirName, "hooks"), + err, + ) + } + writeDaemonFile( + t, + filepath.Join(workspaceRoot, aghconfig.DirName, "hooks", "capture-task-run.sh"), + "#!/bin/sh\ncat > \"$1\"\n", + ) + if err := os.Chmod( + filepath.Join(workspaceRoot, aghconfig.DirName, "hooks", "capture-task-run.sh"), + 0o755, + ); err != nil { + t.Fatalf("os.Chmod(capture-task-run.sh) error = %v", err) + } + writeDaemonFile(t, filepath.Join(workspaceRoot, aghconfig.DirName, "config.toml"), ` +[[hooks.declarations]] +name = "workspace-task-run" +event = "task.run.enqueued" +mode = "sync" +command = "/bin/sh" +args = [".agh/hooks/capture-task-run.sh", ".agh/task-run-enqueued.json"] +`) + + resolvedWorkspace := seedDaemonWorkspace(t, homePaths, workspaceRoot) + + d, err := New( + WithHomePaths(homePaths), + WithConfig(&cfg), + WithLogger(discardLogger()), + ) + if err != nil { + t.Fatalf("New() error = %v", err) + } + d.newSessionManager = func(_ context.Context, deps SessionManagerDeps) (SessionManager, error) { + return &fakeSessionManager{}, nil + } + d.newObserver = func(context.Context, RuntimeDeps) (Observer, error) { + return &fakeObserver{}, nil + } + d.httpFactory = func(context.Context, RuntimeDeps) (Server, error) { + return &fakeServer{name: "http"}, nil + } + d.udsFactory = func(context.Context, RuntimeDeps) (Server, error) { + return &fakeServer{name: "uds"}, nil + } + + if err := d.boot(testutil.Context(t)); err != nil { + t.Fatalf("boot() error = %v", err) + } + t.Cleanup(func() { + if err := d.Shutdown(testutil.Context(t)); err != nil { + t.Fatalf("Shutdown() error = %v", err) + } + }) + if d.hooks == nil { + t.Fatal("boot() did not initialize daemon hooks") + } + + payload := hookspkg.TaskRunEnqueuedPayload{ + PayloadBase: hookspkg.PayloadBase{ + Event: hookspkg.HookTaskRunEnqueued, + Timestamp: time.Date(2026, 4, 26, 19, 30, 0, 0, time.UTC), + }, + TaskRunContext: hookspkg.TaskRunContext{ + TaskID: "task-1", + RunID: "run-1", + WorkspaceID: resolvedWorkspace.ID, + CoordinationChannelID: "operations", + NetworkChannel: "operations", + AgentName: "qa", + TaskStatus: "ready", + RunStatus: "queued", + }, + IdempotencyKey: "task.start.task-1", + } + + if _, err := d.hooks.DispatchTaskRunEnqueued(testutil.Context(t), payload); err != nil { + t.Fatalf("DispatchTaskRunEnqueued() error = %v", err) + } + + outputPath := filepath.Join(workspaceRoot, aghconfig.DirName, "task-run-enqueued.json") + body, err := os.ReadFile(outputPath) + if err != nil { + t.Fatalf("os.ReadFile(%q) error = %v", outputPath, err) + } + + var captured hookspkg.TaskRunEnqueuedPayload + if err := json.Unmarshal(body, &captured); err != nil { + t.Fatalf("json.Unmarshal(task run hook payload) error = %v; body=%s", err, string(body)) + } + if captured.Event != hookspkg.HookTaskRunEnqueued || + captured.WorkspaceID != resolvedWorkspace.ID || + captured.RunID != "run-1" { + t.Fatalf("captured payload = %#v, want enqueued payload for the seeded workspace run", captured) + } + }) +} + func TestBootSkillsWatcherRebuildsHooksBeforeNextDispatch(t *testing.T) { homePaths := integrationHomePaths(t) cfg := testConfig(t, homePaths) diff --git a/internal/daemon/harness_context.go b/internal/daemon/harness_context.go index f4e5a4677..cd740091d 100644 --- a/internal/daemon/harness_context.go +++ b/internal/daemon/harness_context.go @@ -32,6 +32,10 @@ const ( SessionClassDream SessionClass = "dream" // SessionClassSystem identifies daemon-owned system sessions. SessionClassSystem SessionClass = "system" + // SessionClassCoordinator identifies daemon-owned workspace coordinator sessions. + SessionClassCoordinator SessionClass = "coordinator" + // SessionClassSpawned identifies bounded child worker sessions. + SessionClassSpawned SessionClass = "spawned" ) // HarnessPromptSection identifies a startup prompt section managed by harness policy. @@ -311,6 +315,10 @@ func normalizeHarnessSessionType(sessionType session.Type) session.Type { return session.SessionTypeDream case session.SessionTypeSystem: return session.SessionTypeSystem + case session.SessionTypeCoordinator: + return session.SessionTypeCoordinator + case session.SessionTypeSpawned: + return session.SessionTypeSpawned default: return "" } @@ -324,6 +332,10 @@ func harnessSessionClassForType(sessionType session.Type) (SessionClass, error) return SessionClassDream, nil case session.SessionTypeSystem: return SessionClassSystem, nil + case session.SessionTypeCoordinator: + return SessionClassCoordinator, nil + case session.SessionTypeSpawned: + return SessionClassSpawned, nil default: return "", fmt.Errorf("daemon: unsupported harness session type %q", sessionType) } diff --git a/internal/daemon/harness_context_test.go b/internal/daemon/harness_context_test.go index 9fbd35cd7..a2265319f 100644 --- a/internal/daemon/harness_context_test.go +++ b/internal/daemon/harness_context_test.go @@ -94,6 +94,72 @@ func TestHarnessContextResolverMatrix(t *testing.T) { "harness.diagnostic_label": "interactive.channel.network", }, }, + { + name: "Should resolve coordinator policy for coordinator startup session", + input: HarnessResolutionInput{ + Surface: ResolutionSurfaceStartup, + Session: HarnessSessionInput{ + Type: session.SessionTypeCoordinator, + Channel: "coord-run-1", + }, + Turn: HarnessTurnRequest{ + Source: session.TurnSourceUser, + }, + }, + wantSections: []HarnessPromptSection{ + HarnessPromptSectionMemory, + HarnessPromptSectionSkills, + HarnessPromptSectionNetwork, + }, + wantAugmenters: nil, + wantReentry: ReentryModeNone, + wantDetached: DetachedRunModeNone, + wantLabel: "coordinator.channel.user", + wantTags: map[string]string{ + "harness.surface": "startup", + "harness.session_type": "coordinator", + "harness.session_class": "coordinator", + "harness.turn_origin": "user", + "harness.channel_bound": "true", + "harness.diagnostic_label": "coordinator.channel.user", + }, + }, + { + name: "Should resolve spawned policy for spawned worker network turn", + input: HarnessResolutionInput{ + Surface: ResolutionSurfaceTurn, + Session: HarnessSessionInput{ + Type: session.SessionTypeSpawned, + Channel: "builders", + }, + Turn: HarnessTurnRequest{ + Source: session.TurnSourceNetwork, + PromptMeta: acp.PromptMeta{ + Network: &acp.PromptNetworkMeta{ + Channel: "builders", + From: "coordinator.sess", + }, + }, + }, + }, + wantSections: []HarnessPromptSection{ + HarnessPromptSectionMemory, + HarnessPromptSectionSkills, + HarnessPromptSectionNetwork, + }, + wantAugmenters: nil, + wantReentry: ReentryModeNone, + wantDetached: DetachedRunModeNone, + wantLabel: "spawned.channel.network", + wantTags: map[string]string{ + "harness.surface": "turn", + "harness.session_type": "spawned", + "harness.session_class": "spawned", + "harness.turn_origin": "network", + "harness.channel_bound": "true", + "harness.diagnostic_label": "spawned.channel.network", + }, + }, { name: "system session plus synthetic turn requires metadata and resolves reentry policy", input: HarnessResolutionInput{ @@ -348,6 +414,53 @@ func TestSectionSelectorSelectsEligibleStartupSectionsWithoutDuplicates(t *testi } } +func TestSectionSelectorAcceptsCoordinatorStartupSession(t *testing.T) { + t.Parallel() + + t.Run("Should select coordinator startup sections", func(t *testing.T) { + t.Parallel() + + resolver := NewHarnessContextResolver(HarnessRuntimeSignals{ + MemoryPromptSectionEnabled: true, + SkillsPromptSectionEnabled: true, + }) + selector := NewSectionSelector(resolver, nil) + descriptors := defaultStartupPromptSectionDescriptors( + promptSectionProviderFunc( + func(context.Context, *workspacepkg.ResolvedWorkspace) (string, error) { return "memory", nil }, + ), + promptSectionProviderFunc( + func(context.Context, *workspacepkg.ResolvedWorkspace) (string, error) { return "skills", nil }, + ), + nil, + ) + + selected, resolved, err := selector.Select(session.StartupPromptContext{ + SessionType: session.SessionTypeCoordinator, + Channel: "coord-run-1", + }, descriptors) + if err != nil { + t.Fatalf("Select(coordinator) error = %v", err) + } + + if resolved.Session.SessionClass != SessionClassCoordinator { + t.Fatalf("SessionClass = %q, want %q", resolved.Session.SessionClass, SessionClassCoordinator) + } + wantNames := []string{ + string(HarnessPromptSectionMemory), + string(HarnessPromptSectionSkills), + string(HarnessPromptSectionNetwork), + } + gotNames := make([]string, 0, len(selected)) + for _, descriptor := range selected { + gotNames = append(gotNames, descriptor.Name) + } + if !slices.Equal(gotNames, wantNames) { + t.Fatalf("selected section names = %#v, want %#v", gotNames, wantNames) + } + }) +} + func TestHarnessContextResolverResolvePromptUsesSessionInfo(t *testing.T) { t.Parallel() diff --git a/internal/daemon/hooks_bridge.go b/internal/daemon/hooks_bridge.go index 516718c09..bec90f1c5 100644 --- a/internal/daemon/hooks_bridge.go +++ b/internal/daemon/hooks_bridge.go @@ -1424,8 +1424,15 @@ func scopeWorkspaceHookDecls( for _, decl := range decls { cloned := cloneDaemonHookDecl(decl) if resolved != nil { - cloned.Matcher.WorkspaceID = strings.TrimSpace(resolved.ID) - cloned.Matcher.WorkspaceRoot = strings.TrimSpace(resolved.RootDir) + if strings.TrimSpace(cloned.WorkingDir) == "" { + cloned.WorkingDir = strings.TrimSpace(resolved.RootDir) + } + if hookspkg.MatcherFieldAllowedForEvent(cloned.Event, "workspace_id") { + cloned.Matcher.WorkspaceID = strings.TrimSpace(resolved.ID) + } + if hookspkg.MatcherFieldAllowedForEvent(cloned.Event, "workspace_root") { + cloned.Matcher.WorkspaceRoot = strings.TrimSpace(resolved.RootDir) + } } scoped = append(scoped, cloned) } diff --git a/internal/daemon/notifier_test.go b/internal/daemon/notifier_test.go index 6c7fbdbea..b6d4da4ec 100644 --- a/internal/daemon/notifier_test.go +++ b/internal/daemon/notifier_test.go @@ -404,26 +404,159 @@ func TestHooksBridgeHelperCloningAndTimestamp(t *testing.T) { t.Fatal("original matcher ToolReadOnly was mutated") } - resolved := workspaceResolvedForTest("ws-1", "/tmp/ws-1") - scoped := scopeWorkspaceHookDecls([]hookspkg.HookDecl{original}, &resolved) - if len(scoped) != 1 { - t.Fatalf("len(scoped) = %d, want 1", len(scoped)) - } - if scoped[0].Matcher.WorkspaceID != resolved.ID { - t.Fatalf("scoped WorkspaceID = %q, want %q", scoped[0].Matcher.WorkspaceID, resolved.ID) + if got := cloneStringMap(nil); got != nil { + t.Fatalf("cloneStringMap(nil) = %#v, want nil", got) } - if scoped[0].Matcher.WorkspaceRoot != resolved.RootDir { - t.Fatalf("scoped WorkspaceRoot = %q, want %q", scoped[0].Matcher.WorkspaceRoot, resolved.RootDir) +} + +func TestScopeWorkspaceHookDeclsOnlyInjectsSupportedMatcherFields(t *testing.T) { + t.Parallel() + + newDecls := func() []hookspkg.HookDecl { + return []hookspkg.HookDecl{ + { + Name: "session", + Event: hookspkg.HookSessionPostCreate, + }, + { + Name: "task-run", + Event: hookspkg.HookTaskRunEnqueued, + }, + { + Name: "message", + Event: hookspkg.HookMessageDelta, + }, + } } - if original.Matcher.WorkspaceID != "" || original.Matcher.WorkspaceRoot != "" { - t.Fatalf("original matcher workspace fields were mutated: %#v", original.Matcher) + resolved := workspaceResolvedForTest("ws-1", "/tmp/ws-1") + + testCases := []struct { + name string + assert func(t *testing.T, decls []hookspkg.HookDecl, scoped []hookspkg.HookDecl) + }{ + { + name: "Should inject workspace id and root for session hooks", + assert: func(t *testing.T, _ []hookspkg.HookDecl, scoped []hookspkg.HookDecl) { + t.Helper() + + if scoped[0].Matcher.WorkspaceID != resolved.ID { + t.Fatalf("session WorkspaceID = %q, want %q", scoped[0].Matcher.WorkspaceID, resolved.ID) + } + if scoped[0].Matcher.WorkspaceRoot != resolved.RootDir { + t.Fatalf("session WorkspaceRoot = %q, want %q", scoped[0].Matcher.WorkspaceRoot, resolved.RootDir) + } + if err := hookspkg.ValidateMatcherForEvent(scoped[0].Event, scoped[0].Matcher); err != nil { + t.Fatalf("session matcher validation error = %v", err) + } + }, + }, + { + name: "Should inject only workspace id for task-run hooks", + assert: func(t *testing.T, _ []hookspkg.HookDecl, scoped []hookspkg.HookDecl) { + t.Helper() + + if scoped[1].Matcher.WorkspaceID != resolved.ID { + t.Fatalf("task-run WorkspaceID = %q, want %q", scoped[1].Matcher.WorkspaceID, resolved.ID) + } + if scoped[1].Matcher.WorkspaceRoot != "" { + t.Fatalf( + "task-run WorkspaceRoot = %q, want empty because task-run hooks do not support it", + scoped[1].Matcher.WorkspaceRoot, + ) + } + if err := hookspkg.ValidateMatcherForEvent(scoped[1].Event, scoped[1].Matcher); err != nil { + t.Fatalf("task-run matcher validation error = %v", err) + } + }, + }, + { + name: "Should not inject workspace fields for message hooks", + assert: func(t *testing.T, _ []hookspkg.HookDecl, scoped []hookspkg.HookDecl) { + t.Helper() + + if scoped[2].Matcher.WorkspaceID != "" || scoped[2].Matcher.WorkspaceRoot != "" { + t.Fatalf( + "message matcher workspace fields = %#v, want no unsupported workspace scoping", + scoped[2].Matcher, + ) + } + if err := hookspkg.ValidateMatcherForEvent(scoped[2].Event, scoped[2].Matcher); err != nil { + t.Fatalf("message matcher validation error = %v", err) + } + }, + }, + { + name: "Should not mutate original declarations", + assert: func(t *testing.T, decls []hookspkg.HookDecl, _ []hookspkg.HookDecl) { + t.Helper() + + for idx, decl := range decls { + if decl.Matcher.WorkspaceID != "" || decl.Matcher.WorkspaceRoot != "" { + t.Fatalf("decls[%d] matcher workspace fields were mutated: %#v", idx, decl.Matcher) + } + } + }, + }, } - if got := cloneStringMap(nil); got != nil { - t.Fatalf("cloneStringMap(nil) = %#v, want nil", got) + for _, tc := range testCases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + + decls := newDecls() + scoped := scopeWorkspaceHookDecls(decls, &resolved) + if len(scoped) != len(decls) { + t.Fatalf("len(scoped) = %d, want %d", len(scoped), len(decls)) + } + tc.assert(t, decls, scoped) + }) } } +func TestScopeWorkspaceHookDeclsInjectsWorkspaceWorkingDirWhenUnset(t *testing.T) { + t.Parallel() + + t.Run("Should inherit the workspace root as working directory for relative hooks", func(t *testing.T) { + t.Parallel() + + resolved := workspaceResolvedForTest("ws-1", "/tmp/ws-1") + decls := []hookspkg.HookDecl{ + { + Name: "task-run", + Event: hookspkg.HookTaskRunEnqueued, + }, + { + Name: "message", + Event: hookspkg.HookMessageDelta, + }, + } + + scoped := scopeWorkspaceHookDecls(decls, &resolved) + if got, want := scoped[0].WorkingDir, resolved.RootDir; got != want { + t.Fatalf("task-run WorkingDir = %q, want %q", got, want) + } + if got, want := scoped[1].WorkingDir, resolved.RootDir; got != want { + t.Fatalf("message WorkingDir = %q, want %q", got, want) + } + }) + + t.Run("Should preserve an explicit working directory", func(t *testing.T) { + t.Parallel() + + resolved := workspaceResolvedForTest("ws-1", "/tmp/ws-1") + decls := []hookspkg.HookDecl{{ + Name: "explicit", + Event: hookspkg.HookTaskRunEnqueued, + WorkingDir: "/tmp/keep-me", + }} + + scoped := scopeWorkspaceHookDecls(decls, &resolved) + if got, want := scoped[0].WorkingDir, "/tmp/keep-me"; got != want { + t.Fatalf("explicit WorkingDir = %q, want %q", got, want) + } + }) +} + func TestDispatchRuntimeAndExecutorResolvers(t *testing.T) { t.Parallel() diff --git a/internal/daemon/scheduler_runtime.go b/internal/daemon/scheduler_runtime.go index 6cdf3ccf1..c801d4850 100644 --- a/internal/daemon/scheduler_runtime.go +++ b/internal/daemon/scheduler_runtime.go @@ -233,6 +233,7 @@ func (s schedulerSessionSource) Sessions(ctx context.Context) ([]schedulerpkg.Se ID: strings.TrimSpace(info.ID), AgentName: strings.TrimSpace(info.AgentName), WorkspaceID: strings.TrimSpace(info.WorkspaceID), + Channel: strings.TrimSpace(info.Channel), State: strings.TrimSpace(string(info.State)), Prompting: isSchedulerSessionPrompting(s.sessions, info.ID), Capabilities: capabilities, diff --git a/internal/hooks/matcher.go b/internal/hooks/matcher.go index c25de526f..3182c14d9 100644 --- a/internal/hooks/matcher.go +++ b/internal/hooks/matcher.go @@ -142,6 +142,16 @@ func ValidateMatcherForEvent(event HookEvent, matcher HookMatcher) error { return fmt.Errorf("hooks: matcher fields [%s] are not valid for event %q", strings.Join(invalid, ", "), event) } +// MatcherFieldAllowedForEvent reports whether a matcher field is valid for the event family. +func MatcherFieldAllowedForEvent(event HookEvent, field string) bool { + if err := event.Validate(); err != nil { + return false + } + allowed := allowedMatcherFieldsByFamily[event.Family()] + _, ok := allowed[strings.TrimSpace(field)] + return ok +} + // MatchesSession matches session-family hooks. func (m HookMatcher) MatchesSession(payload SessionContext) bool { return m.matchSessionContext(payload, true) diff --git a/internal/hooks/matcher_test.go b/internal/hooks/matcher_test.go index 92fa2b89c..8e2e51c72 100644 --- a/internal/hooks/matcher_test.go +++ b/internal/hooks/matcher_test.go @@ -430,3 +430,50 @@ func TestHookMatcherMatchesAutonomyPayloads(t *testing.T) { t.Fatal("MatchesSpawn() = true, want false for spawn role mismatch") } } + +func TestMatcherFieldAllowedForEvent(t *testing.T) { + t.Parallel() + + tests := []struct { + name string + event HookEvent + field string + want bool + }{ + { + name: "Should allow workspace root for session post-create hook", + event: HookSessionPostCreate, + field: "workspace_root", + want: true, + }, + { + name: "Should allow workspace id for task-run enqueued hook", + event: HookTaskRunEnqueued, + field: "workspace_id", + want: true, + }, + { + name: "Should deny workspace root for task-run enqueued hook", + event: HookTaskRunEnqueued, + field: "workspace_root", + want: false, + }, + { + name: "Should deny workspace id for message delta hook", + event: HookMessageDelta, + field: "workspace_id", + want: false, + }, + {name: "Should deny invalid event", event: HookEvent("bad.event"), field: "workspace_id", want: false}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + t.Parallel() + + if got := MatcherFieldAllowedForEvent(tt.event, tt.field); got != tt.want { + t.Fatalf("MatcherFieldAllowedForEvent(%q, %q) = %v, want %v", tt.event, tt.field, got, tt.want) + } + }) + } +} diff --git a/internal/network/manager.go b/internal/network/manager.go index a96db1a6f..82d553bfd 100644 --- a/internal/network/manager.go +++ b/internal/network/manager.go @@ -897,6 +897,9 @@ func (m *Manager) controlMessageReceivers(result RouteResult) []string { locals := m.peers.LocalPeers(envelope.Channel) receivers := make([]string, 0, len(locals)) for _, local := range locals { + if isEnvelopeSender(local, envelope) { + continue + } receivers = append(receivers, local.SessionID) } return receivers diff --git a/internal/network/manager_test.go b/internal/network/manager_test.go index 6aae30c86..6c359f3d5 100644 --- a/internal/network/manager_test.go +++ b/internal/network/manager_test.go @@ -190,13 +190,16 @@ func TestManagerJoinSendStatusAndLeave(t *testing.T) { if err := manager.JoinChannel(ctx, testJoinRequest("sess-a", "coder.sess-a", "builders")); err != nil { t.Fatalf("JoinChannel() error = %v", err) } + if err := manager.JoinChannel(ctx, testJoinRequest("sess-b", "reviewer.sess-b", "builders")); err != nil { + t.Fatalf("JoinChannel(second peer) error = %v", err) + } status, err := manager.Status(ctx) if err != nil { t.Fatalf("Status(joined) error = %v", err) } - if status.LocalPeers != 1 || status.Channels != 1 { - t.Fatalf("Status(joined) = %#v, want 1 local peer and 1 channel", status) + if status.LocalPeers != 2 || status.Channels != 1 { + t.Fatalf("Status(joined) = %#v, want 2 local peers and 1 channel", status) } id, err := manager.Send(ctx, SendRequest{ @@ -214,7 +217,7 @@ func TestManagerJoinSendStatusAndLeave(t *testing.T) { prompter.waitForCalls(t, 1) call := prompter.call(0) - if got, want := call.sessionID, "sess-a"; got != want { + if got, want := call.sessionID, "sess-b"; got != want { t.Fatalf("prompt session id = %q, want %q", got, want) } if !strings.Contains(call.message, "hello builders") { @@ -402,8 +405,14 @@ func TestManagerQueuesBusyDeliveriesTracksDisconnectsAndShutsDownIdempotently(t } }) + if err := manager.JoinChannel( + ctx, + testJoinRequest("sess-sender", "coder.sess-sender", "builders"), + ); err != nil { + t.Fatalf("JoinChannel(sender) error = %v", err) + } if err := manager.JoinChannel(ctx, testJoinRequest("sess-busy", "reviewer.sess-busy", "builders")); err != nil { - t.Fatalf("JoinChannel() error = %v", err) + t.Fatalf("JoinChannel(busy) error = %v", err) } return ctx, manager, prompter @@ -415,7 +424,7 @@ func TestManagerQueuesBusyDeliveriesTracksDisconnectsAndShutsDownIdempotently(t ctx, manager, prompter := newBusyManagerHarness(t) prompter.setPrompting("sess-busy", true) if _, err := manager.Send(ctx, SendRequest{ - SessionID: "sess-busy", + SessionID: "sess-sender", Channel: "builders", Kind: KindSay, Body: mustRawJSON(t, map[string]any{"text": "queued while busy"}), @@ -586,13 +595,19 @@ func TestManagerAuditsBusyQueueOverflowAsRejected(t *testing.T) { } }) + if err := manager.JoinChannel( + ctx, + testJoinRequest("sess-sender", "coder.sess-sender", "builders"), + ); err != nil { + t.Fatalf("JoinChannel(sender) error = %v", err) + } if err := manager.JoinChannel(ctx, testJoinRequest("sess-busy", "reviewer.sess-busy", "builders")); err != nil { - t.Fatalf("JoinChannel() error = %v", err) + t.Fatalf("JoinChannel(busy) error = %v", err) } prompter.setPrompting("sess-busy", true) firstID, err := manager.Send(ctx, SendRequest{ - SessionID: "sess-busy", + SessionID: "sess-sender", Channel: "builders", Kind: KindSay, Body: mustRawJSON(t, map[string]any{"text": "overflow first"}), @@ -601,7 +616,7 @@ func TestManagerAuditsBusyQueueOverflowAsRejected(t *testing.T) { t.Fatalf("Send(first) error = %v", err) } secondID, err := manager.Send(ctx, SendRequest{ - SessionID: "sess-busy", + SessionID: "sess-sender", Channel: "builders", Kind: KindSay, Body: mustRawJSON(t, map[string]any{"text": "overflow second"}), @@ -711,10 +726,12 @@ func TestManagerStatusTracksWorkflowMetricsAndStructuredLogs(t *testing.T) { var logs bytes.Buffer logger := slog.New(slog.NewTextHandler(&logs, &slog.HandlerOptions{Level: slog.LevelInfo})) prompter := newFakeDeliveryPrompter() + cfg := testManagerConfig() + cfg.GreetInterval = 3600 manager, err := NewManager( ctx, - testManagerConfig(), + cfg, prompter, filepath.Join(t.TempDir(), "network.audit"), nil, @@ -733,6 +750,9 @@ func TestManagerStatusTracksWorkflowMetricsAndStructuredLogs(t *testing.T) { if err := manager.JoinChannel(ctx, testJoinRequest("sess-a", "reviewer.sess-a", "builders")); err != nil { t.Fatalf("JoinChannel() error = %v", err) } + if err := manager.JoinChannel(ctx, testJoinRequest("sess-b", "patcher.sess-b", "builders")); err != nil { + t.Fatalf("JoinChannel(second peer) error = %v", err) + } _, err = manager.Send(ctx, SendRequest{ SessionID: "sess-a", @@ -760,8 +780,8 @@ func TestManagerStatusTracksWorkflowMetricsAndStructuredLogs(t *testing.T) { if err != nil { t.Fatalf("Status() error = %v", err) } - if status.MessagesSent != 2 || status.MessagesReceived != 2 || status.MessagesDelivered != 1 { - t.Fatalf("status message counts = %#v, want sent=2 received=2 delivered=1", status) + if status.MessagesSent != 3 || status.MessagesReceived != 2 || status.MessagesDelivered != 1 { + t.Fatalf("status message counts = %#v, want sent=3 received=2 delivered=1", status) } if status.WorkflowTaggedEvents != 3 || status.HandoffTaggedEvents != 3 { t.Fatalf("status tagged counts = %#v, want workflow=3 handoff=3", status) @@ -770,8 +790,8 @@ func TestManagerStatusTracksWorkflowMetricsAndStructuredLogs(t *testing.T) { for _, metric := range status.KindMetrics { metricsByKind[metric.Kind] = metric } - if greet := metricsByKind[KindGreet]; greet.Sent != 1 || greet.Received != 1 || greet.Delivered != 0 { - t.Fatalf("greet kind metrics = %#v, want sent=1 received=1 delivered=0", greet) + if greet := metricsByKind[KindGreet]; greet.Sent != 2 || greet.Received != 1 || greet.Delivered != 0 { + t.Fatalf("greet kind metrics = %#v, want sent=2 received=1 delivered=0", greet) } if say := metricsByKind[KindSay]; say.Sent != 1 || say.Received != 1 || say.Delivered != 1 { t.Fatalf("say kind metrics = %#v, want sent=1 received=1 delivered=1", say) @@ -920,12 +940,15 @@ func TestManagerShutdownTracksInterruptedInFlightMessages(t *testing.T) { t.Fatalf("NewManager() error = %v", err) } + if err := manager.JoinChannel(ctx, testJoinRequest("sess-sender", "coder.sess-sender", "builders")); err != nil { + t.Fatalf("JoinChannel(sender) error = %v", err) + } if err := manager.JoinChannel(ctx, testJoinRequest("sess-stop", "reviewer.sess-stop", "builders")); err != nil { - t.Fatalf("JoinChannel() error = %v", err) + t.Fatalf("JoinChannel(stop target) error = %v", err) } if _, err := manager.Send(ctx, SendRequest{ - SessionID: "sess-stop", + SessionID: "sess-sender", Channel: "builders", Kind: KindSay, Body: mustRawJSON(t, map[string]any{"text": "hello before shutdown"}), diff --git a/internal/network/router.go b/internal/network/router.go index cc11ac3cf..96f6546f0 100644 --- a/internal/network/router.go +++ b/internal/network/router.go @@ -444,7 +444,9 @@ func (r *Router) handleReceivedCapability(ctx context.Context, state receiveStat return result, nil } if state.envelope.IsDirected() { - result.Deliveries = []Delivery{deliveryFromLocalPeer(state.directedTarget, state.envelope)} + if delivery, ok := deliveryFromLocalPeer(state.directedTarget, state.envelope); ok { + result.Deliveries = []Delivery{delivery} + } return result, nil } result.Deliveries = deliveriesFromLocalPeers(r.peers.LocalPeers(state.envelope.Channel), state.envelope) @@ -457,7 +459,9 @@ func (r *Router) handleReceivedLifecycle(ctx context.Context, state receiveState return RouteResult{}, err } if deliver { - result.Deliveries = []Delivery{deliveryFromLocalPeer(state.directedTarget, state.envelope)} + if delivery, ok := deliveryFromLocalPeer(state.directedTarget, state.envelope); ok { + result.Deliveries = []Delivery{delivery} + } } return result, nil } @@ -584,7 +588,9 @@ func (r *Router) handleWhois( } } if hasDirectedTarget { - result.Deliveries = []Delivery{deliveryFromLocalPeer(directedTarget, envelope)} + if delivery, ok := deliveryFromLocalPeer(directedTarget, envelope); ok { + result.Deliveries = []Delivery{delivery} + } } return result, nil case WhoisTypeRequest: @@ -607,6 +613,10 @@ func (r *Router) handleWhoisRequest( now time.Time, ) (RouteResult, error) { discoveryRequest := parseWhoisCapabilityDiscoveryRequest(envelope.Ext) + if envelope.IsDirected() && hasDirectedTarget && isEnvelopeSender(directedTarget, envelope) { + result.Ignored = true + return result, nil + } responders := r.whoisRequestResponders(envelope, whois, directedTarget, hasDirectedTarget) for _, responder := range responders { @@ -631,12 +641,20 @@ func (r *Router) whoisRequestResponders( hasDirectedTarget bool, ) []LocalPeer { if envelope.IsDirected() { - if hasDirectedTarget { + if hasDirectedTarget && !isEnvelopeSender(directedTarget, envelope) { return []LocalPeer{directedTarget} } return nil } - return r.peers.MatchLocalPeers(envelope.Channel, whois.Query) + matches := r.peers.MatchLocalPeers(envelope.Channel, whois.Query) + responders := make([]LocalPeer, 0, len(matches)) + for _, peer := range matches { + if isEnvelopeSender(peer, envelope) { + continue + } + responders = append(responders, peer) + } + return responders } func (r *Router) buildWhoisResponseEnvelope( @@ -937,17 +955,29 @@ func deliveriesFromLocalPeers(peers []LocalPeer, envelope Envelope) []Delivery { deliveries := make([]Delivery, 0, len(peers)) for _, peer := range peers { - deliveries = append(deliveries, deliveryFromLocalPeer(peer, envelope)) + delivery, ok := deliveryFromLocalPeer(peer, envelope) + if !ok { + continue + } + deliveries = append(deliveries, delivery) } return deliveries } -func deliveryFromLocalPeer(peer LocalPeer, envelope Envelope) Delivery { +func deliveryFromLocalPeer(peer LocalPeer, envelope Envelope) (Delivery, bool) { + if strings.TrimSpace(peer.PeerID) == "" || isEnvelopeSender(peer, envelope) { + return Delivery{}, false + } return Delivery{ SessionID: peer.SessionID, PeerID: peer.PeerID, Envelope: envelope, - } + }, true +} + +func isEnvelopeSender(peer LocalPeer, envelope Envelope) bool { + return strings.TrimSpace(peer.PeerID) != "" && + strings.TrimSpace(peer.PeerID) == strings.TrimSpace(envelope.From) } func buildDirectReceipt( diff --git a/internal/network/router_test.go b/internal/network/router_test.go index 72da1847b..ea5362ef2 100644 --- a/internal/network/router_test.go +++ b/internal/network/router_test.go @@ -138,9 +138,12 @@ func TestRouterRoutesBroadcastAndDirectToCorrectSubjectsAndTargets(t *testing.T) if err != nil { t.Fatalf("Receive(broadcast) error = %v", err) } - if got, want := len(broadcastResult.Deliveries), 2; got != want { + if got, want := len(broadcastResult.Deliveries), 1; got != want { t.Fatalf("len(broadcast deliveries) = %d, want %d", got, want) } + if got, want := broadcastResult.Deliveries[0].SessionID, "sess-b"; got != want { + t.Fatalf("broadcast delivery session = %q, want %q", got, want) + } directInbound, err := router.Receive(context.Background(), transport.Message(1).payload) if err != nil { @@ -154,6 +157,139 @@ func TestRouterRoutesBroadcastAndDirectToCorrectSubjectsAndTargets(t *testing.T) } } +func TestRouterDoesNotDeliverLocalEchoesToSender(t *testing.T) { + t.Parallel() + + setup := func(t *testing.T) (*Router, *spyRouterTransport, PeerCard) { + t.Helper() + + now := time.Date(2026, 4, 26, 12, 0, 0, 0, time.UTC) + registry, err := NewPeerRegistry(10*time.Second, WithPeerRegistryClock(func() time.Time { return now })) + if err != nil { + t.Fatalf("NewPeerRegistry() error = %v", err) + } + sender := mustPeerCard(t, "coordinator.sess-a") + if _, err := registry.RegisterLocal("sess-a", "marketing", sender, now); err != nil { + t.Fatalf("RegisterLocal(sender) error = %v", err) + } + + transport := &spyRouterTransport{} + router, err := NewRouter( + registry, + transport, + DefaultMaxReplayAge, + WithRouterClock(func() time.Time { return now }), + ) + if err != nil { + t.Fatalf("NewRouter() error = %v", err) + } + return router, transport, sender + } + + t.Run("Should suppress broadcast self-echo deliveries", func(t *testing.T) { + t.Parallel() + + router, transport, _ := setup(t) + if _, err := router.Send(context.Background(), SendRequest{ + SessionID: "sess-a", + Channel: "marketing", + Kind: KindSay, + Body: mustRawJSON(t, SayBody{Text: "local status"}), + }); err != nil { + t.Fatalf("Send(say self echo) error = %v", err) + } + broadcastResult, err := router.Receive(context.Background(), transport.Message(0).payload) + if err != nil { + t.Fatalf("Receive(say self echo) error = %v", err) + } + if got := len(broadcastResult.Deliveries); got != 0 { + t.Fatalf("len(self broadcast deliveries) = %d, want 0", got) + } + }) + + t.Run("Should suppress directed self-echo deliveries", func(t *testing.T) { + t.Parallel() + + router, transport, sender := setup(t) + if _, err := router.Send(context.Background(), SendRequest{ + SessionID: "sess-a", + Channel: "marketing", + Kind: KindDirect, + To: stringPtr(sender.PeerID), + InteractionID: stringPtr("int-self"), + Body: mustRawJSON(t, DirectBody{Text: "self-directed loop"}), + }); err != nil { + t.Fatalf("Send(direct self echo) error = %v", err) + } + directResult, err := router.Receive(context.Background(), transport.Message(0).payload) + if err != nil { + t.Fatalf("Receive(direct self echo) error = %v", err) + } + if got := len(directResult.Deliveries); got != 0 { + t.Fatalf("len(self direct deliveries) = %d, want 0", got) + } + }) +} + +func TestRouterIgnoresDirectedWhoisRequestToSender(t *testing.T) { + t.Parallel() + + t.Run("Should ignore directed self whois without generated responses", func(t *testing.T) { + t.Parallel() + + now := time.Date(2026, 4, 26, 12, 30, 0, 0, time.UTC) + registry, err := NewPeerRegistry(10*time.Second, WithPeerRegistryClock(func() time.Time { return now })) + if err != nil { + t.Fatalf("NewPeerRegistry() error = %v", err) + } + sender := mustPeerCard(t, "coordinator.sess-a") + if _, err := registry.RegisterLocal("sess-a", "marketing", sender, now); err != nil { + t.Fatalf("RegisterLocal(sender) error = %v", err) + } + + transport := &spyRouterTransport{} + router, err := NewRouter( + registry, + transport, + DefaultMaxReplayAge, + WithRouterClock(func() time.Time { return now }), + ) + if err != nil { + t.Fatalf("NewRouter() error = %v", err) + } + payload, err := json.Marshal(Envelope{ + Protocol: ProtocolV0, + ID: "msg_whois_self", + Kind: KindWhois, + Channel: "marketing", + From: sender.PeerID, + To: stringPtr(sender.PeerID), + TS: now.Unix(), + Body: mustRawJSON(t, WhoisBody{ + Type: WhoisTypeRequest, + Query: "self-directed", + }), + }) + if err != nil { + t.Fatalf("json.Marshal(self whois) error = %v", err) + } + + result, err := router.Receive(context.Background(), payload) + if err != nil { + t.Fatalf("Receive(self whois) error = %v", err) + } + if !result.Ignored || result.Rejected { + t.Fatalf("self whois result = %#v, want ignored and not rejected", result) + } + if len(result.Generated) != 0 || len(result.Deliveries) != 0 { + t.Fatalf("self whois result = %#v, want no generated responses or deliveries", result) + } + if got := transport.Count(); got != 0 { + t.Fatalf("transport publish count = %d, want 0", got) + } + }) +} + func TestRouterRejectsDuplicateBeforeReprocessingLifecycleState(t *testing.T) { t.Parallel() diff --git a/internal/observe/observer.go b/internal/observe/observer.go index 8f88a102e..ac8076b7e 100644 --- a/internal/observe/observer.go +++ b/internal/observe/observer.go @@ -420,28 +420,7 @@ func (o *Observer) Close(ctx context.Context) error { // OnSessionCreated registers the session in the global observability database. func (o *Observer) OnSessionCreated(ctx context.Context, sess *session.Session) { info := sess.Info() - snapshot := observedSession{ - agentName: info.AgentName, - workspaceID: info.WorkspaceID, - } - if o.resolvePermissionMode != nil { - permissionMode, err := o.resolvePermissionMode(ctx, info.AgentName, info.WorkspaceID) - if err != nil { - o.logger.Warn( - "observe: resolve permission mode failed", - "session_id", - info.ID, - "agent_name", - info.AgentName, - "workspace_id", - info.WorkspaceID, - "error", - err, - ) - } else { - snapshot.permissionMode = strings.TrimSpace(permissionMode) - } - } + snapshot := o.observedSessionSnapshot(ctx, info.ID, info.AgentName, info.WorkspaceID) o.trackSession(info.ID, snapshot) @@ -535,7 +514,7 @@ func (o *Observer) observeAgentEvent(ctx context.Context, sessionID string, payl return } - id, snapshot, ok := o.validateObservedEvent(sessionID, event) + id, snapshot, ok := o.validateObservedEvent(ctx, sessionID, event) if !ok { return } @@ -576,6 +555,7 @@ func (o *Observer) observeAgentEvent(ctx context.Context, sessionID string, payl } func (o *Observer) validateObservedEvent( + ctx context.Context, sessionID string, event acp.AgentEvent, ) (string, observedSession, bool) { @@ -587,8 +567,17 @@ func (o *Observer) validateObservedEvent( snapshot, ok := o.sessionSnapshot(id) if !ok { - o.logger.Warn("observe: skipped agent event for unknown session", "session_id", id, "event_type", event.Type) - return "", observedSession{}, false + snapshot, ok = o.recoverSessionSnapshot(ctx, id) + if !ok { + o.logger.Warn( + "observe: skipped agent event for unknown session", + "session_id", + id, + "event_type", + event.Type, + ) + return "", observedSession{}, false + } } if strings.TrimSpace(event.Type) == "" { o.logger.Warn( @@ -606,6 +595,86 @@ func (o *Observer) validateObservedEvent( return id, snapshot, true } +func (o *Observer) recoverSessionSnapshot(ctx context.Context, sessionID string) (observedSession, bool) { + requireObserverContext(ctx, "recoverSessionSnapshot") + + id := strings.TrimSpace(sessionID) + if id == "" { + return observedSession{}, false + } + + if o.sessionSource != nil { + for _, info := range o.sessionSource.List() { + if info == nil || strings.TrimSpace(info.ID) != id { + continue + } + snapshot := o.observedSessionSnapshot(ctx, id, info.AgentName, info.WorkspaceID) + o.trackSession(id, snapshot) + return snapshot, true + } + } + + if o.registry == nil { + return observedSession{}, false + } + sessions, err := o.registry.ListSessions(ctx, store.SessionListQuery{}) + if err != nil { + o.logger.Warn("observe: recover session snapshot failed", "session_id", id, "error", err) + return observedSession{}, false + } + for _, info := range sessions { + if strings.TrimSpace(info.ID) != id { + continue + } + snapshot := o.observedSessionSnapshot(ctx, id, info.AgentName, info.WorkspaceID) + if strings.TrimSpace(info.State) != string(session.StateStopped) { + o.trackSession(id, snapshot) + } + return snapshot, true + } + return observedSession{}, false +} + +func (o *Observer) observedSessionSnapshot( + ctx context.Context, + sessionID string, + agentName string, + workspaceID string, +) observedSession { + requireObserverContext(ctx, "observedSessionSnapshot") + + snapshot := observedSession{ + agentName: strings.TrimSpace(agentName), + workspaceID: strings.TrimSpace(workspaceID), + } + if o.resolvePermissionMode == nil { + return snapshot + } + permissionMode, err := o.resolvePermissionMode(ctx, snapshot.agentName, snapshot.workspaceID) + if err != nil { + o.logger.Warn( + "observe: resolve permission mode failed", + "session_id", + strings.TrimSpace(sessionID), + "agent_name", + snapshot.agentName, + "workspace_id", + snapshot.workspaceID, + "error", + err, + ) + return snapshot + } + snapshot.permissionMode = strings.TrimSpace(permissionMode) + return snapshot +} + +func requireObserverContext(ctx context.Context, caller string) { + if ctx == nil { + panic("observe: nil context passed to " + caller) + } +} + func observedEventTimestamp(event acp.AgentEvent, now func() time.Time) time.Time { if !event.Timestamp.IsZero() { return event.Timestamp diff --git a/internal/observe/observer_test.go b/internal/observe/observer_test.go index 101ec78f9..192063fa3 100644 --- a/internal/observe/observer_test.go +++ b/internal/observe/observer_test.go @@ -2,6 +2,7 @@ package observe import ( "context" + "errors" "io" "log/slog" "os" @@ -90,6 +91,145 @@ func TestOnAgentEventWritesEventSummaryToGlobalDB(t *testing.T) { } } +func TestOnAgentEventRecoversSessionSnapshot(t *testing.T) { + t.Parallel() + + testCases := []struct { + name string + sessionID string + state session.State + summary string + setup func(t *testing.T, h *harness, sess *session.Session) + wantCached bool + }{ + { + name: "Should recover session snapshot from live source", + sessionID: "sess-live-source", + state: session.StateActive, + summary: "live source event was observed", + setup: func(t *testing.T, h *harness, sess *session.Session) { + t.Helper() + + if err := h.registry.RegisterSession( + testutil.Context(t), + sessionInfoFromSession(sess.Info()), + ); err != nil { + t.Fatalf("RegisterSession(live source) error = %v", err) + } + h.observer.registry = listSessionsFailingRegistry{Registry: h.registry} + h.source.sessions = []*session.Info{sess.Info()} + }, + wantCached: true, + }, + { + name: "Should recover session snapshot from registry", + sessionID: "sess-registry-source", + state: session.StateActive, + summary: "registry event was observed", + setup: func(t *testing.T, h *harness, sess *session.Session) { + t.Helper() + + if err := h.registry.RegisterSession( + testutil.Context(t), + sessionInfoFromSession(sess.Info()), + ); err != nil { + t.Fatalf("RegisterSession() error = %v", err) + } + }, + wantCached: true, + }, + { + name: "Should not cache stopped sessions recovered from registry", + sessionID: "sess-stopped-registry-source", + state: session.StateStopped, + summary: "stopped registry event was observed", + setup: func(t *testing.T, h *harness, sess *session.Session) { + t.Helper() + + if err := h.registry.RegisterSession( + testutil.Context(t), + sessionInfoFromSession(sess.Info()), + ); err != nil { + t.Fatalf("RegisterSession(stopped) error = %v", err) + } + }, + wantCached: false, + }, + } + + for _, tc := range testCases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + + h := newHarness(t) + sess := newSession(tc.sessionID, tc.state, h.workspace, h.now) + tc.setup(t, h, sess) + + h.observer.OnAgentEvent(testutil.Context(t), sess.ID, acp.AgentEvent{ + Type: "agent_message", + TurnID: "turn-" + tc.sessionID, + Timestamp: h.now.Add(time.Minute), + Text: tc.summary, + }) + + events, err := h.observer.QueryEvents(testutil.Context(t), store.EventSummaryQuery{SessionID: sess.ID}) + if err != nil { + t.Fatalf("QueryEvents() error = %v", err) + } + if got, want := len(events), 1; got != want { + t.Fatalf("len(events) = %d, want %d", got, want) + } + if events[0].AgentName != "coder" || events[0].Summary != tc.summary { + t.Fatalf("events[0] = %#v, want recovered event summary %q", events[0], tc.summary) + } + + _, cached := h.observer.sessionSnapshot(sess.ID) + if cached != tc.wantCached { + t.Fatalf("sessionSnapshot(%q) cached = %v, want %v", sess.ID, cached, tc.wantCached) + } + }) + } +} + +func TestObserverSessionSnapshotRequiresContext(t *testing.T) { + t.Parallel() + + nilContext := func() context.Context { + return nil + } + testCases := []struct { + name string + call func(observer *Observer) + }{ + { + name: "Should panic when recovering a session snapshot with nil context", + call: func(observer *Observer) { + observer.recoverSessionSnapshot(nilContext(), "sess-nil-context") + }, + }, + { + name: "Should panic when building an observed session snapshot with nil context", + call: func(observer *Observer) { + observer.observedSessionSnapshot(nilContext(), "sess-nil-context", "coder", observerWorkspaceID) + }, + }, + } + + for _, tc := range testCases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + + h := newHarness(t) + defer func() { + if recovered := recover(); recovered == nil { + t.Fatal("snapshot helper panic = nil, want non-nil") + } + }() + tc.call(h.observer) + }) + } +} + func TestSweepRetentionModes(t *testing.T) { t.Parallel() @@ -719,6 +859,17 @@ type stubSessionSource struct { sessions []*session.Info } +type listSessionsFailingRegistry struct { + Registry +} + +func (r listSessionsFailingRegistry) ListSessions( + context.Context, + store.SessionListQuery, +) ([]store.SessionInfo, error) { + return nil, errors.New("registry fallback disabled") +} + type observeBridgeSource struct { *bridgepkg.Service broker *bridgepkg.Broker diff --git a/internal/scheduler/scheduler.go b/internal/scheduler/scheduler.go index 0db374c55..88ccdadd2 100644 --- a/internal/scheduler/scheduler.go +++ b/internal/scheduler/scheduler.go @@ -516,9 +516,24 @@ func isEligibleSession(work *RunSnapshot, candidate SessionSnapshot, busy map[st if !scopeMatches(work.Task, candidate) { return false } + if !coordinationChannelMatches(work, candidate) { + return false + } return capabilitiesCover(candidate.Capabilities, work.Run.RequiredCapabilities) } +func coordinationChannelMatches(work *RunSnapshot, candidate SessionSnapshot) bool { + if work == nil { + return false + } + runChannel := strings.TrimSpace(work.Run.CoordinationChannelID) + sessionChannel := strings.TrimSpace(candidate.Channel) + if runChannel == "" { + return true + } + return sessionChannel == runChannel +} + func scopeMatches(task taskpkg.Task, candidate SessionSnapshot) bool { scope := task.Scope.Normalize() workspaceID := strings.TrimSpace(task.WorkspaceID) diff --git a/internal/scheduler/scheduler_channel_test.go b/internal/scheduler/scheduler_channel_test.go new file mode 100644 index 000000000..718be1546 --- /dev/null +++ b/internal/scheduler/scheduler_channel_test.go @@ -0,0 +1,126 @@ +package scheduler + +import ( + "testing" + "time" + + "github.com/jonboulle/clockwork" + taskpkg "github.com/pedronauck/agh/internal/task" + "github.com/pedronauck/agh/internal/testutil" +) + +func TestRunOnceHonorsCoordinationChannel(t *testing.T) { + t.Parallel() + + t.Run("Should wake only same-channel sessions for channel-bound work", func(t *testing.T) { + t.Parallel() + + base := time.Date(2026, 4, 26, 13, 30, 0, 0, time.UTC) + work := workSnapshot("task-1", "run-1", taskpkg.ScopeWorkspace, "ws-1", []string{"go"}, base) + work.Run.CoordinationChannelID = "finance" + source := &fakeTaskSource{pending: []RunSnapshot{work}} + sessions := &fakeSessionSource{sessions: []SessionSnapshot{ + { + ID: "sess-marketing", + WorkspaceID: "ws-1", + Channel: "marketing", + State: "active", + Capabilities: []string{"go"}, + CreatedAt: base, + }, + { + ID: "sess-finance", + WorkspaceID: "ws-1", + Channel: "finance", + State: "active", + Capabilities: []string{"go"}, + CreatedAt: base.Add(time.Second), + }, + }} + waker := &fakeWaker{} + scheduler := newTestScheduler(t, source, sessions, waker, WithClock(clockwork.NewFakeClockAt(base))) + + result, err := scheduler.RunOnce(testutil.Context(t)) + if err != nil { + t.Fatalf("RunOnce() error = %v", err) + } + if result.WakeAttempts != 1 || result.WakeSucceeded != 1 { + t.Fatalf("result = %#v, want one successful wake", result) + } + + targets := waker.targetsSnapshot() + if got, want := len(targets), 1; got != want { + t.Fatalf("wake targets = %d, want %d", got, want) + } + if got, want := targets[0].Session.ID, "sess-finance"; got != want { + t.Fatalf("woken session = %q, want %q", got, want) + } + }) + + t.Run("Should record no match when only wrong-channel sessions are available", func(t *testing.T) { + t.Parallel() + + base := time.Date(2026, 4, 26, 13, 45, 0, 0, time.UTC) + work := workSnapshot("task-1", "run-1", taskpkg.ScopeWorkspace, "ws-1", []string{"go"}, base) + work.Run.CoordinationChannelID = "finance" + source := &fakeTaskSource{pending: []RunSnapshot{work}} + sessions := &fakeSessionSource{sessions: []SessionSnapshot{ + { + ID: "sess-marketing", + WorkspaceID: "ws-1", + Channel: "marketing", + State: "active", + Capabilities: []string{"go"}, + CreatedAt: base, + }, + }} + waker := &fakeWaker{} + scheduler := newTestScheduler(t, source, sessions, waker, WithClock(clockwork.NewFakeClockAt(base))) + + result, err := scheduler.RunOnce(testutil.Context(t)) + if err != nil { + t.Fatalf("RunOnce() error = %v", err) + } + if result.WakeAttempts != 0 || result.NoMatchRuns != 1 { + t.Fatalf("result = %#v, want one no-match and no wake attempts", result) + } + if got := len(waker.targetsSnapshot()); got != 0 { + t.Fatalf("wake targets = %d, want 0", got) + } + }) + + t.Run( + "Should record no match when only unscoped sessions are available for channel-bound work", + func(t *testing.T) { + t.Parallel() + + base := time.Date(2026, 4, 26, 14, 0, 0, 0, time.UTC) + work := workSnapshot("task-1", "run-1", taskpkg.ScopeWorkspace, "ws-1", []string{"go"}, base) + work.Run.CoordinationChannelID = "finance" + source := &fakeTaskSource{pending: []RunSnapshot{work}} + sessions := &fakeSessionSource{sessions: []SessionSnapshot{ + { + ID: "sess-unscoped", + WorkspaceID: "ws-1", + Channel: "", + State: "active", + Capabilities: []string{"go"}, + CreatedAt: base, + }, + }} + waker := &fakeWaker{} + scheduler := newTestScheduler(t, source, sessions, waker, WithClock(clockwork.NewFakeClockAt(base))) + + result, err := scheduler.RunOnce(testutil.Context(t)) + if err != nil { + t.Fatalf("RunOnce() error = %v", err) + } + if result.WakeAttempts != 0 || result.NoMatchRuns != 1 { + t.Fatalf("result = %#v, want one no-match and no wake attempts", result) + } + if got := len(waker.targetsSnapshot()); got != 0 { + t.Fatalf("wake targets = %d, want 0", got) + } + }, + ) +} diff --git a/internal/scheduler/types.go b/internal/scheduler/types.go index b23ab4a30..8cc650545 100644 --- a/internal/scheduler/types.go +++ b/internal/scheduler/types.go @@ -55,6 +55,7 @@ type SessionSnapshot struct { ID string AgentName string WorkspaceID string + Channel string State string Prompting bool Capabilities []string diff --git a/internal/session/manager_start.go b/internal/session/manager_start.go index b27f12797..0183d4d03 100644 --- a/internal/session/manager_start.go +++ b/internal/session/manager_start.go @@ -372,6 +372,7 @@ func (s *sessionStartSpec) newStartingSession( Name: s.sessionName, AgentName: resolved.Name, Provider: strings.TrimSpace(resolved.Provider), + Model: strings.TrimSpace(resolved.Model), WorkspaceID: s.workspace.ID, Workspace: s.workspace.RootDir, Channel: s.channel, diff --git a/internal/session/query.go b/internal/session/query.go index 8136fbf5e..cc5203bfa 100644 --- a/internal/session/query.go +++ b/internal/session/query.go @@ -276,6 +276,7 @@ func sessionInfoFromMeta(meta store.SessionMeta) *Info { Name: meta.Name, AgentName: meta.AgentName, Provider: meta.Provider, + Model: strings.TrimSpace(meta.Model), WorkspaceID: meta.WorkspaceID, Channel: meta.Channel, Type: normalizeSessionType(Type(meta.SessionType)), diff --git a/internal/session/query_test.go b/internal/session/query_test.go index 8fedd80e0..754542c07 100644 --- a/internal/session/query_test.go +++ b/internal/session/query_test.go @@ -577,6 +577,7 @@ func TestReadMetaAndQueryHelpers(t *testing.T) { Name: "stored", AgentName: "coder", Provider: "codex", + Model: " gpt-4o ", WorkspaceID: "ws-1", State: string(StateStopped), StopReason: &stopReason, @@ -591,6 +592,9 @@ func TestReadMetaAndQueryHelpers(t *testing.T) { if got := info.Provider; got != "codex" { t.Fatalf("sessionInfoFromMeta().Provider = %q, want %q", got, "codex") } + if got := info.Model; got != "gpt-4o" { + t.Fatalf("sessionInfoFromMeta().Model = %q, want %q", got, "gpt-4o") + } if got := info.State; got != StateStopped { t.Fatalf("sessionInfoFromMeta().State = %q, want %q", got, StateStopped) } diff --git a/internal/session/session.go b/internal/session/session.go index 724cef96b..584d609ba 100644 --- a/internal/session/session.go +++ b/internal/session/session.go @@ -51,6 +51,7 @@ type Info struct { Name string AgentName string Provider string + Model string WorkspaceID string Workspace string Channel string @@ -76,6 +77,7 @@ type Session struct { Name string AgentName string Provider string + Model string WorkspaceID string Workspace string Channel string @@ -121,6 +123,7 @@ func (s *Session) Info() *Info { Name: s.Name, AgentName: s.AgentName, Provider: s.Provider, + Model: s.Model, WorkspaceID: s.WorkspaceID, Workspace: s.Workspace, Channel: s.Channel, @@ -773,6 +776,7 @@ func (s *Session) Meta() store.SessionMeta { Name: s.Name, AgentName: s.AgentName, Provider: s.Provider, + Model: s.Model, WorkspaceID: s.WorkspaceID, Channel: s.Channel, SessionType: string(normalizeSessionType(s.Type)), diff --git a/internal/store/globaldb/global_db_task.go b/internal/store/globaldb/global_db_task.go index d002b2bb0..7f53f76f9 100644 --- a/internal/store/globaldb/global_db_task.go +++ b/internal/store/globaldb/global_db_task.go @@ -396,8 +396,11 @@ func (g *GlobalDB) UpdateTaskRun(ctx context.Context, run taskpkg.Run) error { if err != nil { return err } - if strings.TrimSpace(current.SessionID) != "" && - strings.TrimSpace(normalized.SessionID) != strings.TrimSpace(current.SessionID) { + currentSessionID := strings.TrimSpace(current.SessionID) + nextSessionID := strings.TrimSpace(normalized.SessionID) + if currentSessionID != "" && + nextSessionID != currentSessionID && + (nextSessionID != "" || normalized.Status.Normalize() != taskpkg.TaskRunStatusQueued) { return taskpkg.ErrSessionAlreadyBound } if normalized.QueuedAt.IsZero() { diff --git a/internal/store/globaldb/global_db_task_test.go b/internal/store/globaldb/global_db_task_test.go index d6ef3fa2b..41c41206b 100644 --- a/internal/store/globaldb/global_db_task_test.go +++ b/internal/store/globaldb/global_db_task_test.go @@ -1067,6 +1067,82 @@ func TestGlobalDBUpdateTaskRunRejectsSessionRebinding(t *testing.T) { } } +func TestGlobalDBUpdateTaskRunAllowsQueuedSessionRelease(t *testing.T) { + t.Parallel() + + t.Run("Should release queued session when requeued", func(t *testing.T) { + t.Parallel() + + globalDB := openTestGlobalDB(t) + taskRecord := taskRecordForTest("task-run-queued-release") + if err := globalDB.CreateTask(testutil.Context(t), taskRecord); err != nil { + t.Fatalf("CreateTask() error = %v", err) + } + + run := taskRunForTest("run-queued-release", taskRecord.ID) + run.Status = taskpkg.TaskRunStatusClaimed + run.ClaimedBy = &taskpkg.ActorIdentity{Kind: taskpkg.ActorKindAgentSession, Ref: "sess-queued-release"} + run.SessionID = "sess-queued-release" + run.ClaimedAt = run.QueuedAt.Add(time.Minute) + if err := globalDB.CreateTaskRun(testutil.Context(t), run); err != nil { + t.Fatalf("CreateTaskRun() error = %v", err) + } + + run.Status = taskpkg.TaskRunStatusQueued + run.ClaimedBy = nil + run.SessionID = "" + run.ClaimedAt = time.Time{} + err := globalDB.UpdateTaskRun(testutil.Context(t), run) + if err != nil { + t.Fatalf("UpdateTaskRun(requeue release) error = %v", err) + } + + stored, err := globalDB.GetTaskRun(testutil.Context(t), run.ID) + if err != nil { + t.Fatalf("GetTaskRun(requeued) error = %v", err) + } + if got, want := stored.Status, taskpkg.TaskRunStatusQueued; got != want { + t.Fatalf("stored.Status = %q, want %q", got, want) + } + if stored.SessionID != "" || stored.ClaimedBy != nil || !stored.ClaimedAt.IsZero() { + t.Fatalf( + "stored lease fields = session %q claimed_by %#v claimed_at %v, want released", + stored.SessionID, + stored.ClaimedBy, + stored.ClaimedAt, + ) + } + }) +} + +func TestGlobalDBUpdateTaskRunRejectsActiveSessionClear(t *testing.T) { + t.Parallel() + + t.Run("Should reject clearing session binding for active runs", func(t *testing.T) { + t.Parallel() + + globalDB := openTestGlobalDB(t) + taskRecord := taskRecordForTest("task-run-active-clear") + if err := globalDB.CreateTask(testutil.Context(t), taskRecord); err != nil { + t.Fatalf("CreateTask() error = %v", err) + } + + run := taskRunForTest("run-active-clear", taskRecord.ID) + run.Status = taskpkg.TaskRunStatusRunning + run.SessionID = "sess-active-clear" + run.StartedAt = run.QueuedAt.Add(time.Minute) + if err := globalDB.CreateTaskRun(testutil.Context(t), run); err != nil { + t.Fatalf("CreateTaskRun() error = %v", err) + } + + run.SessionID = "" + err := globalDB.UpdateTaskRun(testutil.Context(t), run) + if !errors.Is(err, taskpkg.ErrSessionAlreadyBound) { + t.Fatalf("UpdateTaskRun(active clear) error = %v, want ErrSessionAlreadyBound", err) + } + }) +} + func TestGlobalDBTaskAndRunReferenceErrors(t *testing.T) { t.Parallel() diff --git a/internal/store/types.go b/internal/store/types.go index 667266d02..c03590f3e 100644 --- a/internal/store/types.go +++ b/internal/store/types.go @@ -604,6 +604,7 @@ type SessionMeta struct { Name string `json:"name,omitempty"` AgentName string `json:"agent_name"` Provider string `json:"provider,omitempty"` + Model string `json:"model,omitempty"` WorkspaceID string `json:"workspace_id,omitempty"` Channel string `json:"channel,omitempty"` SessionType string `json:"session_type,omitempty"` diff --git a/internal/task/hooks.go b/internal/task/hooks.go index 788acc230..5c54f6be9 100644 --- a/internal/task/hooks.go +++ b/internal/task/hooks.go @@ -119,3 +119,10 @@ func defaultTaskRunHooks(hooks RunHookDispatcher) RunHookDispatcher { } return noopTaskRunHooks{} } + +func taskRunObservationHookContext(ctx context.Context) context.Context { + if ctx == nil { + return context.TODO() + } + return context.WithoutCancel(ctx) +} diff --git a/internal/task/hooks_test.go b/internal/task/hooks_test.go index 348d0285b..f19787113 100644 --- a/internal/task/hooks_test.go +++ b/internal/task/hooks_test.go @@ -133,6 +133,102 @@ func TestTaskRunEnqueuedHookIncludesActorAndOrigin(t *testing.T) { } } +func TestTaskRunObservationHooksDetachFromCallerCancellation(t *testing.T) { + t.Parallel() + + var enqueuedCtx context.Context + var postClaimCtx context.Context + store := newInMemoryManagerStore() + manager := newTaskManagerForTestWithOptions(t, store, WithTaskRunHooks(recordingTaskRunHooks{ + enqueued: func( + ctx context.Context, + payload hookspkg.TaskRunEnqueuedPayload, + ) (hookspkg.TaskRunEnqueuedPayload, error) { + enqueuedCtx = ctx + return payload, nil + }, + postClaim: func( + ctx context.Context, + payload hookspkg.TaskRunPostClaimPayload, + ) (hookspkg.TaskRunPostClaimPayload, error) { + postClaimCtx = ctx + return payload, nil + }, + })) + actor := validActorContext() + taskRecord, err := manager.CreateTask(context.Background(), CreateTask{ + Scope: ScopeGlobal, + Title: "Observation hook context task", + }, actor) + if err != nil { + t.Fatalf("CreateTask() error = %v", err) + } + + enqueueCtx, cancelEnqueue := context.WithCancel(context.Background()) + run, err := manager.EnqueueRun(enqueueCtx, EnqueueRun{TaskID: taskRecord.ID}, actor) + if err != nil { + t.Fatalf("EnqueueRun() error = %v", err) + } + cancelEnqueue() + t.Run("Should keep enqueued hook context active", func(t *testing.T) { + t.Parallel() + assertContextStillActive(enqueuedCtx, t, "enqueued") + }) + + claimCtx, cancelClaim := context.WithCancel(context.Background()) + if _, err := manager.ClaimRun(claimCtx, run.ID, ClaimRun{}, actor); err != nil { + t.Fatalf("ClaimRun() error = %v", err) + } + cancelClaim() + t.Run("Should keep post-claim hook context active", func(t *testing.T) { + t.Parallel() + assertContextStillActive(postClaimCtx, t, "post-claim") + }) +} + +func TestTaskRunPreClaimHookUsesCallerCancellation(t *testing.T) { + t.Parallel() + + var preClaimCtx context.Context + store := newInMemoryManagerStore() + manager := newTaskManagerForTestWithOptions(t, store, WithTaskRunHooks(recordingTaskRunHooks{ + preClaim: func( + ctx context.Context, + payload hookspkg.TaskRunPreClaimPayload, + ) (hookspkg.TaskRunPreClaimPayload, error) { + preClaimCtx = ctx + return payload, nil + }, + })) + actor := validActorContext() + taskRecord, err := manager.CreateTask(context.Background(), CreateTask{ + Scope: ScopeGlobal, + Title: "Pre-claim hook context task", + }, actor) + if err != nil { + t.Fatalf("CreateTask() error = %v", err) + } + run, err := manager.EnqueueRun(context.Background(), EnqueueRun{TaskID: taskRecord.ID}, actor) + if err != nil { + t.Fatalf("EnqueueRun() error = %v", err) + } + + claimCtx, cancelClaim := context.WithCancel(context.Background()) + if _, err := manager.ClaimRun(claimCtx, run.ID, ClaimRun{}, actor); err != nil { + t.Fatalf("ClaimRun() error = %v", err) + } + cancelClaim() + + if preClaimCtx == nil { + t.Fatal("pre-claim hook context was not captured") + } + select { + case <-preClaimCtx.Done(): + default: + t.Fatal("pre-claim hook context was not canceled with caller context") + } +} + func TestTokenFencedLeaseTransitionsDispatchTaskRunHooks(t *testing.T) { t.Parallel() @@ -305,6 +401,18 @@ func TestTokenFencedLeaseTransitionsDispatchTaskRunHooks(t *testing.T) { } } +func assertContextStillActive(ctx context.Context, t *testing.T, label string) { + t.Helper() + if ctx == nil { + t.Fatalf("%s hook context was not captured", label) + } + select { + case <-ctx.Done(): + t.Fatalf("%s hook context canceled after caller returned: %v", label, ctx.Err()) + default: + } +} + type recordingTaskRunHooks struct { enqueued func(context.Context, hookspkg.TaskRunEnqueuedPayload) (hookspkg.TaskRunEnqueuedPayload, error) preClaim func(context.Context, hookspkg.TaskRunPreClaimPayload) (hookspkg.TaskRunPreClaimPayload, error) diff --git a/internal/task/lease_hooks.go b/internal/task/lease_hooks.go index c047ad14c..1f17b0db3 100644 --- a/internal/task/lease_hooks.go +++ b/internal/task/lease_hooks.go @@ -20,7 +20,7 @@ func (m *Service) dispatchTaskRunLeaseExtended( }, TaskRunContext: m.taskRunHookContext(run, taskRecord, actor), } - _, err := m.taskHooks.DispatchTaskRunLeaseExtended(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunLeaseExtended(taskRunObservationHookContext(ctx), payload) return err } @@ -47,7 +47,7 @@ func (m *Service) dispatchTaskRunLeaseExpired( PreviousSessionID: strings.TrimSpace(recovery.PreviousSessionID), RecoveryReason: strings.TrimSpace(recovery.Reason), } - _, err := m.taskHooks.DispatchTaskRunLeaseExpired(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunLeaseExpired(taskRunObservationHookContext(ctx), payload) return err } @@ -72,7 +72,7 @@ func (m *Service) dispatchTaskRunLeaseRecoveredFromExpiration( RecoveryAction: string(RunBootRecoveryRequeue), RecoveryReason: strings.TrimSpace(recovery.Reason), } - _, err := m.taskHooks.DispatchTaskRunLeaseRecovered(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunLeaseRecovered(taskRunObservationHookContext(ctx), payload) return err } @@ -96,7 +96,7 @@ func (m *Service) dispatchTaskRunReleased( PreviousSessionID: strings.TrimSpace(previous.SessionID), RecoveryReason: strings.TrimSpace(reason), } - _, err := m.taskHooks.DispatchTaskRunReleased(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunReleased(taskRunObservationHookContext(ctx), payload) return err } @@ -113,7 +113,7 @@ func (m *Service) dispatchTaskRunCompleted( }, TaskRunContext: m.taskRunHookContext(run, taskRecord, actor), } - _, err := m.taskHooks.DispatchTaskRunCompleted(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunCompleted(taskRunObservationHookContext(ctx), payload) return err } @@ -130,6 +130,6 @@ func (m *Service) dispatchTaskRunFailed( }, TaskRunContext: m.taskRunHookContext(run, taskRecord, actor), } - _, err := m.taskHooks.DispatchTaskRunFailed(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunFailed(taskRunObservationHookContext(ctx), payload) return err } diff --git a/internal/task/manager.go b/internal/task/manager.go index 6d3dd5c18..fd5fb5186 100644 --- a/internal/task/manager.go +++ b/internal/task/manager.go @@ -1966,7 +1966,7 @@ func (m *Service) dispatchTaskRunEnqueued( TaskRunContext: m.taskRunHookContext(run, taskRecord, actor), IdempotencyKey: strings.TrimSpace(idempotencyKey), } - _, err := m.taskHooks.DispatchTaskRunEnqueued(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunEnqueued(taskRunObservationHookContext(ctx), payload) return err } @@ -2020,7 +2020,7 @@ func (m *Service) dispatchTaskRunPostClaim( TaskRunContext: m.taskRunHookContext(run, taskRecord, actor), ClaimedAt: run.ClaimedAt, } - _, err := m.taskHooks.DispatchTaskRunPostClaim(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunPostClaim(taskRunObservationHookContext(ctx), payload) return err } @@ -2044,7 +2044,7 @@ func (m *Service) dispatchTaskRunLeaseRecovered( RecoveryAction: string(recovery.Action.Normalize()), RecoveryReason: strings.TrimSpace(recovery.Reason), } - _, err := m.taskHooks.DispatchTaskRunLeaseRecovered(ctx, payload) + _, err := m.taskHooks.DispatchTaskRunLeaseRecovered(taskRunObservationHookContext(ctx), payload) return err } diff --git a/internal/task/manager_integration_test.go b/internal/task/manager_integration_test.go index 50e12291c..bbb8835cf 100644 --- a/internal/task/manager_integration_test.go +++ b/internal/task/manager_integration_test.go @@ -1088,6 +1088,74 @@ func TestTaskManagerRunLifecyclePersistsAndReconcilesAgainstStorage(t *testing.T } } +func TestTaskManagerRecoverRunOnBootRequeuesBoundRunWithGlobalDB(t *testing.T) { + t.Parallel() + + t.Run("Should requeue bound run on boot and release session binding", func(t *testing.T) { + t.Parallel() + + ctx := testutil.Context(t) + db := openTaskManagerGlobalDB(t) + manager := newTaskManagerIntegration(t, db) + operator, err := taskpkg.DeriveHumanActorContext("operator", taskpkg.OriginKindCLI, "agh task run") + if err != nil { + t.Fatalf("DeriveHumanActorContext() error = %v", err) + } + agent, err := taskpkg.DeriveAgentSessionActorContext("sess-stale-boot") + if err != nil { + t.Fatalf("DeriveAgentSessionActorContext() error = %v", err) + } + daemon, err := taskpkg.DeriveDaemonActorContext("boot-recovery", "daemon.boot") + if err != nil { + t.Fatalf("DeriveDaemonActorContext() error = %v", err) + } + + taskRecord, err := manager.CreateTask(ctx, taskpkg.CreateTask{ + Scope: taskpkg.ScopeGlobal, + Title: "Boot recovery integration", + }, operator) + if err != nil { + t.Fatalf("CreateTask() error = %v", err) + } + run, err := manager.EnqueueRun(ctx, taskpkg.EnqueueRun{TaskID: taskRecord.ID}, operator) + if err != nil { + t.Fatalf("EnqueueRun() error = %v", err) + } + claim, err := manager.ClaimNextRun(ctx, taskpkg.ClaimCriteria{ + Scope: taskpkg.ScopeGlobal, + ClaimerSessionID: "sess-stale-boot", + LeaseDuration: time.Hour, + Now: time.Date(2026, 4, 26, 12, 0, 0, 0, time.UTC), + }, agent) + if err != nil { + t.Fatalf("ClaimNextRun() error = %v", err) + } + if claim.Run.ID != run.ID || claim.Run.SessionID != "sess-stale-boot" { + t.Fatalf("claim.Run = %#v, want run %q bound to sess-stale-boot", claim.Run, run.ID) + } + + recovered, err := manager.RecoverRunOnBoot(ctx, run.ID, taskpkg.RunBootRecovery{ + Action: taskpkg.RunBootRecoveryRequeue, + Reason: "orphaned_on_boot", + SessionState: "stopped", + }, daemon) + if err != nil { + t.Fatalf("RecoverRunOnBoot(requeue) error = %v", err) + } + if recovered.Status != taskpkg.TaskRunStatusQueued || recovered.SessionID != "" || recovered.ClaimedBy != nil { + t.Fatalf("recovered = %#v, want queued run with released session binding", recovered) + } + + stored, err := db.GetTaskRun(ctx, run.ID) + if err != nil { + t.Fatalf("GetTaskRun(recovered) error = %v", err) + } + if stored.Status != taskpkg.TaskRunStatusQueued || stored.SessionID != "" || stored.ClaimedBy != nil { + t.Fatalf("stored = %#v, want queued run with released session binding", stored) + } + }) +} + func TestTaskManagerCancelTaskTreePersistsCancellationAudit(t *testing.T) { t.Parallel() diff --git a/internal/workspace/clone.go b/internal/workspace/clone.go index 4d8e81fcc..adfe36596 100644 --- a/internal/workspace/clone.go +++ b/internal/workspace/clone.go @@ -73,6 +73,7 @@ func cloneConfig(src *aghconfig.Config) aghconfig.Config { }, Extensions: src.Extensions, Automation: src.Automation, + Autonomy: src.Autonomy, Hooks: aghconfig.HooksConfig{ Declarations: cloneHookDecls(src.Hooks.Declarations), }, diff --git a/internal/workspace/resolver_test.go b/internal/workspace/resolver_test.go index 94d29ac09..ff23667d5 100644 --- a/internal/workspace/resolver_test.go +++ b/internal/workspace/resolver_test.go @@ -851,73 +851,100 @@ func TestListReturnsClonedWorkspaces(t *testing.T) { func TestCloneConfigProducesDeepCopy(t *testing.T) { t.Parallel() - toolReadOnly := true - original := aghconfig.Config{ - Session: aghconfig.SessionConfig{ - Limits: aghconfig.SessionLimitsConfig{ - Timeout: time.Minute, - }, - }, - Providers: map[string]aghconfig.ProviderConfig{ - "claude": { - Command: "claude", - DefaultModel: "sonnet", - APIKeyEnv: "ANTHROPIC_API_KEY", - MCPServers: []aghconfig.MCPServer{ - { - Name: "github", - Command: "npx", - Args: []string{"-y"}, - Env: map[string]string{"TOKEN": "one"}, - }, + t.Run("Should produce an independent deep copy", func(t *testing.T) { + t.Parallel() + + toolReadOnly := true + original := aghconfig.Config{ + Session: aghconfig.SessionConfig{ + Limits: aghconfig.SessionLimitsConfig{ + Timeout: time.Minute, }, }, - }, - Skills: aghconfig.SkillsConfig{ - Enabled: true, - DisabledSkills: []string{"alpha"}, - PollInterval: time.Second, - }, - Hooks: aghconfig.HooksConfig{ - Declarations: []hookspkg.HookDecl{{ - Name: "test-hook", - Args: []string{"one"}, - Env: map[string]string{"TOKEN": "one"}, - Metadata: map[string]string{ - "origin": "test", + Autonomy: aghconfig.AutonomyConfig{ + Coordinator: aghconfig.CoordinatorConfig{ + Enabled: true, + AgentName: "coordinator", + Provider: "codex", + Model: "gpt-4o", + DefaultTTL: 45 * time.Minute, + MaxChildren: 5, + MaxActivePerWorkspace: 1, }, - Matcher: hookspkg.HookMatcher{ - ToolReadOnly: &toolReadOnly, + }, + Providers: map[string]aghconfig.ProviderConfig{ + "claude": { + Command: "claude", + DefaultModel: "sonnet", + APIKeyEnv: "ANTHROPIC_API_KEY", + MCPServers: []aghconfig.MCPServer{ + { + Name: "github", + Command: "npx", + Args: []string{"-y"}, + Env: map[string]string{"TOKEN": "one"}, + }, + }, }, - }}, - }, - } - - cloned := cloneConfig(&original) - cloned.Session.Limits.Timeout = 2 * time.Minute - cloned.Providers["claude"] = aghconfig.ProviderConfig{} - cloned.Skills.DisabledSkills[0] = "beta" - cloned.Hooks.Declarations[0].Args[0] = "two" - cloned.Hooks.Declarations[0].Env["TOKEN"] = "two" - cloned.Hooks.Declarations[0].Metadata["origin"] = "mutated" - *cloned.Hooks.Declarations[0].Matcher.ToolReadOnly = false + }, + Skills: aghconfig.SkillsConfig{ + Enabled: true, + DisabledSkills: []string{"alpha"}, + PollInterval: time.Second, + }, + Hooks: aghconfig.HooksConfig{ + Declarations: []hookspkg.HookDecl{{ + Name: "test-hook", + Args: []string{"one"}, + Env: map[string]string{"TOKEN": "one"}, + Metadata: map[string]string{ + "origin": "test", + }, + Matcher: hookspkg.HookMatcher{ + ToolReadOnly: &toolReadOnly, + }, + }}, + }, + } - if got, want := original.Session.Limits.Timeout, time.Minute; got != want { - t.Fatalf("original Session.Limits.Timeout = %s, want %s", got, want) - } - provider := original.Providers["claude"] - if provider.Command != "claude" || provider.MCPServers[0].Env["TOKEN"] != "one" { - t.Fatalf("original provider mutated: %#v", provider) - } - if got, want := original.Skills.DisabledSkills, []string{"alpha"}; !slices.Equal(got, want) { - t.Fatalf("original Skills.DisabledSkills = %#v, want %#v", got, want) - } - hook := original.Hooks.Declarations[0] - if hook.Args[0] != "one" || hook.Env["TOKEN"] != "one" || - hook.Metadata["origin"] != "test" || hook.Matcher.ToolReadOnly == nil || - !*hook.Matcher.ToolReadOnly { - t.Fatalf("original hook mutated: %#v", hook) - } + cloned := cloneConfig(&original) + cloned.Session.Limits.Timeout = 2 * time.Minute + cloned.Autonomy.Coordinator.Enabled = false + cloned.Autonomy.Coordinator.AgentName = "mutated-coordinator" + cloned.Autonomy.Coordinator.DefaultTTL = 2 * time.Hour + cloned.Providers["claude"] = aghconfig.ProviderConfig{} + cloned.Skills.DisabledSkills[0] = "beta" + cloned.Hooks.Declarations[0].Args[0] = "two" + cloned.Hooks.Declarations[0].Env["TOKEN"] = "two" + cloned.Hooks.Declarations[0].Metadata["origin"] = "mutated" + *cloned.Hooks.Declarations[0].Matcher.ToolReadOnly = false + + if got, want := original.Session.Limits.Timeout, time.Minute; got != want { + t.Fatalf("original Session.Limits.Timeout = %s, want %s", got, want) + } + if got, want := original.Autonomy.Coordinator.DefaultTTL, 45*time.Minute; got != want { + t.Fatalf("original Autonomy.Coordinator.DefaultTTL = %s, want %s", got, want) + } + if !original.Autonomy.Coordinator.Enabled { + t.Fatal("original Autonomy.Coordinator.Enabled = false, want true") + } + if got, want := original.Autonomy.Coordinator.AgentName, "coordinator"; got != want { + t.Fatalf("original Autonomy.Coordinator.AgentName = %q, want %q", got, want) + } + provider := original.Providers["claude"] + if provider.Command != "claude" || provider.MCPServers[0].Env["TOKEN"] != "one" { + t.Fatalf("original provider mutated: %#v", provider) + } + if got, want := original.Skills.DisabledSkills, []string{"alpha"}; !slices.Equal(got, want) { + t.Fatalf("original Skills.DisabledSkills = %#v, want %#v", got, want) + } + hook := original.Hooks.Declarations[0] + if hook.Args[0] != "one" || hook.Env["TOKEN"] != "one" || + hook.Metadata["origin"] != "test" || hook.Matcher.ToolReadOnly == nil || + !*hook.Matcher.ToolReadOnly { + t.Fatalf("original hook mutated: %#v", hook) + } + }) } func TestWorkspaceHelperFunctions(t *testing.T) { diff --git a/openapi/agh.json b/openapi/agh.json index 36b44f2cb..7d8f9d075 100644 --- a/openapi/agh.json +++ b/openapi/agh.json @@ -96,6 +96,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "500": { "content": { "application/json": { @@ -112,6 +128,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -291,6 +323,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -339,6 +387,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -469,6 +533,22 @@ }, "description": "OK" }, + "400": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Invalid channel receive query" + }, "401": { "content": { "application/json": { @@ -485,6 +565,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -533,6 +629,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -720,6 +832,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -768,6 +896,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -1524,6 +1668,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -1556,6 +1716,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -1650,6 +1826,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -1682,6 +1874,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -2185,6 +2393,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -2217,6 +2441,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -2879,6 +3119,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -3389,6 +3645,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "409": { "content": { "application/json": { @@ -3437,6 +3709,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -3627,6 +3915,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -3691,6 +3995,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -3884,6 +4204,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -3948,6 +4284,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -4141,6 +4493,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -4205,6 +4573,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } @@ -4397,6 +4781,22 @@ }, "description": "Agent caller identity is missing" }, + "403": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Forbidden - workspace or permission mismatch" + }, "404": { "content": { "application/json": { @@ -4461,6 +4861,22 @@ }, "description": "Internal server error" }, + "503": { + "content": { + "application/json": { + "schema": { + "properties": { + "error": { + "type": "string" + } + }, + "required": ["error"], + "type": "object" + } + } + }, + "description": "Service unavailable - dependent service missing" + }, "default": { "description": "" } diff --git a/packages/site/AGENTS.md b/packages/site/AGENTS.md index 99506c21c..bda7ab001 100644 --- a/packages/site/AGENTS.md +++ b/packages/site/AGENTS.md @@ -1,6 +1,6 @@ # CLAUDE.md (packages/site) -Fumadocs documentation site at `agh.compozy.com`. Built with Next.js 16, Fumadocs 16, Remotion (for protocol illustrations). Bun-managed. +Fumadocs documentation site at `agh.network`. Built with Next.js 16, Fumadocs 16, Remotion (for protocol illustrations). Bun-managed. ## Critical Rules diff --git a/packages/site/CLAUDE.md b/packages/site/CLAUDE.md index 99506c21c..bda7ab001 100644 --- a/packages/site/CLAUDE.md +++ b/packages/site/CLAUDE.md @@ -1,6 +1,6 @@ # CLAUDE.md (packages/site) -Fumadocs documentation site at `agh.compozy.com`. Built with Next.js 16, Fumadocs 16, Remotion (for protocol illustrations). Bun-managed. +Fumadocs documentation site at `agh.network`. Built with Next.js 16, Fumadocs 16, Remotion (for protocol illustrations). Bun-managed. ## Critical Rules diff --git a/packages/ui/README.md b/packages/ui/README.md index b278626ad..83e4a2836 100644 --- a/packages/ui/README.md +++ b/packages/ui/README.md @@ -51,27 +51,27 @@ Provider plumbing, typography atoms, and the `cn` class-merger. Layout shells, overlays, containers, and navigation primitives. -| Export | Story | Notes | -| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | -| `Sidebar` · `SidebarProps` · `SIDEBAR_PANEL_WIDTH_DEFAULT` · `SIDEBAR_RAIL_WIDTH` | [`sidebar.stories.tsx`](./src/components/stories/sidebar.stories.tsx) | Rail + header + nav + footer slots. Collapse trigger is built in; host must not add its own. | -| `SplitPane` · `SplitPaneProps` · `SPLIT_LIST_WIDTH_DEFAULT` | [`split-pane.stories.tsx`](./src/components/stories/split-pane.stories.tsx) | List + detail + `detailEmpty` slots with narrow-breakpoint back-button. | -| `PageHeader` · `PageHeaderProps` | [`page-header.stories.tsx`](./src/components/stories/page-header.stories.tsx) | Eyebrow + title + actions. | -| `Section` · `SectionProps` | [`section.stories.tsx`](./src/components/stories/section.stories.tsx) | Titled content region with eyebrow + optional actions. | -| `Toolbar` · `ToolbarProps` | [`toolbar.stories.tsx`](./src/components/stories/toolbar.stories.tsx) | Inline toolbar row. | -| `Card` · `CardHeader` · `CardFooter` · `CardTitle` · `CardAction` · `CardDescription` · `CardContent` · `CardProps` · `CardSize` | [`card.stories.tsx`](./src/components/stories/card.stories.tsx) | Flat `--color-surface` panel. `activeRail` prop renders a 2px accent left-rail for in-flight signal. | -| `Table` · `TableHeader` · `TableBody` · `TableFooter` · `TableHead` · `TableRow` · `TableCell` · `TableCaption` | [`table.stories.tsx`](./src/components/stories/table.stories.tsx) | Dense data table. | -| `Item` · `ItemActions` · `ItemContent` · `ItemDescription` · `ItemFooter` · `ItemGroup` · `ItemHeader` · `ItemMedia` · `ItemSeparator` · `ItemTitle` | [`item.stories.tsx`](./src/components/stories/item.stories.tsx) | List row with leading/trailing media + metadata slots. | -| `Tabs` · `TabsContent` · `TabsList` · `TabsTrigger` · `tabsListVariants` | [`tabs.stories.tsx`](./src/components/stories/tabs.stories.tsx) | Base UI Tabs. | -| `Accordion` · `AccordionContent` · `AccordionItem` · `AccordionTrigger` | [`accordion.stories.tsx`](./src/components/stories/accordion.stories.tsx) | Base UI Accordion (`multiple` boolean, **not** Radix `type`). | -| `Collapsible` · `CollapsibleContent` · `CollapsibleTrigger` | [`collapsible.stories.tsx`](./src/components/stories/collapsible.stories.tsx) | CSS-animated disclosure. | -| `Breadcrumb` · `BreadcrumbEllipsis` · `BreadcrumbItem` · `BreadcrumbLink` · `BreadcrumbList` · `BreadcrumbPage` · `BreadcrumbSeparator` | [`breadcrumb.stories.tsx`](./src/components/stories/breadcrumb.stories.tsx) | Route breadcrumb. | -| `ScrollArea` · `ScrollBar` | [`scroll-area.stories.tsx`](./src/components/stories/scroll-area.stories.tsx) | Styled scrollable region. | -| `Empty` · `EmptyProps` | [`empty.stories.tsx`](./src/components/stories/empty.stories.tsx) | Empty-state scaffold. | -| `Dialog` · `DialogClose` · `DialogContent` · `DialogDescription` · `DialogFooter` · `DialogHeader` · `DialogOverlay` · `DialogPortal` · `DialogTitle` · `DialogTrigger` | [`dialog.stories.tsx`](./src/components/stories/dialog.stories.tsx) | `motion`-animated modal. | -| `Sheet` · `SheetClose` · `SheetContent` · `SheetDescription` · `SheetFooter` · `SheetHeader` · `SheetTitle` · `SheetTrigger` | [`sheet.stories.tsx`](./src/components/stories/sheet.stories.tsx) | `motion`-animated side panel. | -| `Popover` · `PopoverContent` · `PopoverDescription` · `PopoverHeader` · `PopoverTitle` · `PopoverTrigger` | [`popover.stories.tsx`](./src/components/stories/popover.stories.tsx) | `motion`-animated floating panel. | -| `Tooltip` · `TooltipContent` · `TooltipProvider` · `TooltipTrigger` | [`tooltip.stories.tsx`](./src/components/stories/tooltip.stories.tsx) | `motion`-animated hover hint. | -| `DropdownMenu` · `DropdownMenuCheckboxItem` · `DropdownMenuContent` · `DropdownMenuGroup` · `DropdownMenuItem` · `DropdownMenuLabel` · `DropdownMenuPortal` · `DropdownMenuRadioGroup` · `DropdownMenuRadioItem` · `DropdownMenuSeparator` · `DropdownMenuShortcut` · `DropdownMenuSub` · `DropdownMenuSubContent` · `DropdownMenuSubTrigger` · `DropdownMenuTrigger` | [`dropdown-menu.stories.tsx`](./src/components/stories/dropdown-menu.stories.tsx) | Base UI Menu — `DropdownMenuLabel` must be inside `DropdownMenuGroup`. | +| Export | Story | Notes | +| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `Sidebar` · `SidebarSectionLabel` · `SidebarProps` · `SIDEBAR_PANEL_WIDTH_DEFAULT` · `SIDEBAR_RAIL_WIDTH` | [`sidebar.stories.tsx`](./src/components/stories/sidebar.stories.tsx) | Rail + header + nav + footer slots. Collapse trigger is built in; host must not add its own. `SidebarSectionLabel` is the canonical uppercase heading utility. | +| `SplitPane` · `SplitPaneProps` · `SPLIT_LIST_WIDTH_DEFAULT` | [`split-pane.stories.tsx`](./src/components/stories/split-pane.stories.tsx) | List + detail + `detailEmpty` slots with narrow-breakpoint back-button. | +| `PageHeader` · `PageHeaderProps` | [`page-header.stories.tsx`](./src/components/stories/page-header.stories.tsx) | Eyebrow + title + actions. | +| `Section` · `SectionProps` | [`section.stories.tsx`](./src/components/stories/section.stories.tsx) | Titled content region with eyebrow + optional actions. | +| `Toolbar` · `ToolbarProps` | [`toolbar.stories.tsx`](./src/components/stories/toolbar.stories.tsx) | Inline toolbar row. | +| `Card` · `CardHeader` · `CardFooter` · `CardTitle` · `CardAction` · `CardDescription` · `CardContent` · `CardProps` · `CardSize` | [`card.stories.tsx`](./src/components/stories/card.stories.tsx) | Flat `--color-surface` panel. `activeRail` prop renders a 2px accent left-rail for in-flight signal. | +| `Table` · `TableHeader` · `TableBody` · `TableFooter` · `TableHead` · `TableRow` · `TableCell` · `TableCaption` | [`table.stories.tsx`](./src/components/stories/table.stories.tsx) | Dense data table. | +| `Item` · `ItemActions` · `ItemContent` · `ItemDescription` · `ItemFooter` · `ItemGroup` · `ItemHeader` · `ItemMedia` · `ItemSeparator` · `ItemTitle` | [`item.stories.tsx`](./src/components/stories/item.stories.tsx) | List row with leading/trailing media + metadata slots. | +| `Tabs` · `TabsContent` · `TabsList` · `TabsTrigger` · `tabsListVariants` | [`tabs.stories.tsx`](./src/components/stories/tabs.stories.tsx) | Base UI Tabs. | +| `Accordion` · `AccordionContent` · `AccordionItem` · `AccordionTrigger` | [`accordion.stories.tsx`](./src/components/stories/accordion.stories.tsx) | Base UI Accordion (`multiple` boolean, **not** Radix `type`). | +| `Collapsible` · `CollapsibleContent` · `CollapsibleTrigger` | [`collapsible.stories.tsx`](./src/components/stories/collapsible.stories.tsx) | CSS-animated disclosure. | +| `Breadcrumb` · `BreadcrumbEllipsis` · `BreadcrumbItem` · `BreadcrumbLink` · `BreadcrumbList` · `BreadcrumbPage` · `BreadcrumbSeparator` | [`breadcrumb.stories.tsx`](./src/components/stories/breadcrumb.stories.tsx) | Route breadcrumb. | +| `ScrollArea` · `ScrollBar` | [`scroll-area.stories.tsx`](./src/components/stories/scroll-area.stories.tsx) | Styled scrollable region. | +| `Empty` · `EmptyProps` | [`empty.stories.tsx`](./src/components/stories/empty.stories.tsx) | Empty-state scaffold. | +| `Dialog` · `DialogClose` · `DialogContent` · `DialogDescription` · `DialogFooter` · `DialogHeader` · `DialogOverlay` · `DialogPortal` · `DialogTitle` · `DialogTrigger` | [`dialog.stories.tsx`](./src/components/stories/dialog.stories.tsx) | `motion`-animated modal. | +| `Sheet` · `SheetClose` · `SheetContent` · `SheetDescription` · `SheetFooter` · `SheetHeader` · `SheetTitle` · `SheetTrigger` | [`sheet.stories.tsx`](./src/components/stories/sheet.stories.tsx) | `motion`-animated side panel. | +| `Popover` · `PopoverContent` · `PopoverDescription` · `PopoverHeader` · `PopoverTitle` · `PopoverTrigger` | [`popover.stories.tsx`](./src/components/stories/popover.stories.tsx) | `motion`-animated floating panel. | +| `Tooltip` · `TooltipContent` · `TooltipProvider` · `TooltipTrigger` | [`tooltip.stories.tsx`](./src/components/stories/tooltip.stories.tsx) | `motion`-animated hover hint. | +| `DropdownMenu` · `DropdownMenuCheckboxItem` · `DropdownMenuContent` · `DropdownMenuGroup` · `DropdownMenuItem` · `DropdownMenuLabel` · `DropdownMenuPortal` · `DropdownMenuRadioGroup` · `DropdownMenuRadioItem` · `DropdownMenuSeparator` · `DropdownMenuShortcut` · `DropdownMenuSub` · `DropdownMenuSubContent` · `DropdownMenuSubTrigger` · `DropdownMenuTrigger` | [`dropdown-menu.stories.tsx`](./src/components/stories/dropdown-menu.stories.tsx) | Base UI Menu — `DropdownMenuLabel` must be inside `DropdownMenuGroup`. | ### Form @@ -99,31 +99,35 @@ Controls, selection, and the input scaffolding primitives. Status, alerting, progress, and density-sensitive signal primitives. -| Export | Story | Notes | -| -------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- | -| `Alert` · `AlertTitle` · `AlertDescription` · `AlertAction` · `alertVariants` · `AlertProps` | [`alert.stories.tsx`](./src/components/stories/alert.stories.tsx) | Inline alerts (default/destructive/warning/success/info/accent). | -| `Progress` · `ProgressTrack` · `ProgressIndicator` · `ProgressLabel` · `ProgressValue` | [`progress.stories.tsx`](./src/components/stories/progress.stories.tsx) | Linear progress. | -| `Badge` · `badgeVariants` | [`badge.stories.tsx`](./src/components/stories/badge.stories.tsx) | Tinted badges — use `MonoBadge` for status, `KindChip` for kind labels. | -| `Skeleton` | [`skeleton.stories.tsx`](./src/components/stories/skeleton.stories.tsx) | Shimmer placeholder. | -| `Spinner` | [`spinner.stories.tsx`](./src/components/stories/spinner.stories.tsx) | Spinner atom. | -| `Toaster` · `toast` · `ToasterProps` | [`sonner.stories.tsx`](./src/components/stories/sonner.stories.tsx) | `sonner` re-export. Mount `` once at the app root. Default `theme="system"`. | -| `StatusDot` · `StatusDotProps` · `StatusDotTone` · `StatusDotSize` | [`status-dot.stories.tsx`](./src/components/stories/status-dot.stories.tsx) | Live-status dot. Tone vocabulary: `success \| warning \| danger \| info \| accent \| neutral`. | -| `MonoBadge` · `monoBadgeVariants` · `MonoBadgeProps` · `MonoBadgeTone` | [`mono-badge.stories.tsx`](./src/components/stories/mono-badge.stories.tsx) | 11px mono status badge (`RUNNING`, `DONE`, `ERROR`, …). | -| `KindChip` · `KindChipProps` | [`kind-chip.stories.tsx`](./src/components/stories/kind-chip.stories.tsx) | 5px-radius kind label — protocol kinds, scopes, categories. | -| `ConnectionIndicator` · `ConnectionIndicatorProps` · `ConnectionStatus` | [`connection-indicator.stories.tsx`](./src/components/stories/connection-indicator.stories.tsx) | Live-connection composite (`StatusDot` + label). Default labels `Connected` / `Disconnected` / `Reconnecting`. | -| `Metric` · `MetricProps` · `MetricTone` | [`metric.stories.tsx`](./src/components/stories/metric.stories.tsx) | Dashboard metric with `detail` (inline mono unit) + `subtext` (secondary line) slots. | -| `Pill` · `Pills` · `pillVariants` · `pillToggleVariants` · `PillProps` · `PillsProps` · `PillsItem` · `PillVariant` · `PillSize` | [`pills.stories.tsx`](./src/components/stories/pills.stories.tsx) | `Pill` standalone + `Pills` tablist (`role="tab"`, `aria-selected`). | -| `Avatar` · `AvatarBadge` · `AvatarFallback` · `AvatarGroup` · `AvatarGroupCount` · `AvatarImage` | [`avatar.stories.tsx`](./src/components/stories/avatar.stories.tsx) | Identity avatar with grouping. | +| Export | Story | Notes | +| -------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `Alert` · `AlertTitle` · `AlertDescription` · `AlertAction` · `alertVariants` · `AlertProps` | [`alert.stories.tsx`](./src/components/stories/alert.stories.tsx) | Inline alerts (default/destructive/warning/success/info/accent). | +| `Progress` · `ProgressTrack` · `ProgressIndicator` · `ProgressLabel` · `ProgressValue` | [`progress.stories.tsx`](./src/components/stories/progress.stories.tsx) | Linear progress. | +| `Badge` · `badgeVariants` | [`badge.stories.tsx`](./src/components/stories/badge.stories.tsx) | Tinted badges — use `MonoBadge` for status, `KindChip` for kind labels. | +| `Skeleton` | [`skeleton.stories.tsx`](./src/components/stories/skeleton.stories.tsx) | Shimmer placeholder. | +| `Spinner` | [`spinner.stories.tsx`](./src/components/stories/spinner.stories.tsx) | Spinner atom. | +| `Toaster` · `toast` · `ToasterProps` | [`sonner.stories.tsx`](./src/components/stories/sonner.stories.tsx) | `sonner` re-export. Mount `` once at the app root. Default `theme="system"`. | +| `StatusDot` · `StatusDotProps` · `StatusDotTone` · `StatusDotSize` | [`status-dot.stories.tsx`](./src/components/stories/status-dot.stories.tsx) | Live-status dot. Tone vocabulary: `success \| warning \| danger \| info \| accent \| neutral`. | +| `MonoBadge` · `monoBadgeVariants` · `MonoBadgeProps` · `MonoBadgeTone` | [`mono-badge.stories.tsx`](./src/components/stories/mono-badge.stories.tsx) | 11px mono status badge (`RUNNING`, `DONE`, `ERROR`, …). `tone="solid-accent"` is reserved for unread pills. | +| `MonoChip` · `MonoChipProps` | [`mono-chip.stories.tsx`](./src/components/stories/mono-chip.stories.tsx) | Neutral inline chip — capability descriptors, tag rows. For tinted semantic variants use `MonoBadge`. | +| `KindChip` · `KindChipProps` · `KIND_DOT_COLORS` | [`kind-chip.stories.tsx`](./src/components/stories/kind-chip.stories.tsx) | Wire-dot kind marker (`say`, `greet`, `direct`, `receipt`, `recipe`, `trace`, `whois`). Unknown kinds render without a dot. `KIND_DOT_COLORS` is the canonical kind→color map. | +| `WireChip` · `WireChipProps` | [`wire-chip.stories.tsx`](./src/components/stories/wire-chip.stories.tsx) | Free-floating filter chip with optional leading wire-dot. For a contained segmented toggle use `Pills` instead. | +| `ConnectionIndicator` · `ConnectionIndicatorProps` · `ConnectionStatus` | [`connection-indicator.stories.tsx`](./src/components/stories/connection-indicator.stories.tsx) | Live-connection composite (`StatusDot` + label). Default labels `Connected` / `Disconnected` / `Reconnecting`. | +| `Metric` · `MetricProps` · `MetricTone` | [`metric.stories.tsx`](./src/components/stories/metric.stories.tsx) | Dashboard metric with `detail` (inline mono unit) + `subtext` (secondary line) slots. | +| `Pill` · `Pills` · `pillVariants` · `pillToggleVariants` · `PillProps` · `PillsProps` · `PillsItem` · `PillVariant` · `PillSize` | [`pills.stories.tsx`](./src/components/stories/pills.stories.tsx) | `Pill` standalone + `Pills` tablist (`role="tab"`, `aria-selected`). | +| `Avatar` · `AvatarBadge` · `AvatarFallback` · `AvatarGroup` · `AvatarGroupCount` · `AvatarImage` | [`avatar.stories.tsx`](./src/components/stories/avatar.stories.tsx) | Identity avatar with grouping. | ### Chat Style-only shells for agent conversations. They accept children (body, meta, status) from domain code — they do not know about session state, streaming, or tool IDs. -| Export | Story | Notes | -| --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | -| `ChatMessageBubble` · `ChatMessageBubbleProps` · `ChatMessageRole` · `ChatMessageAlign` | [`chat-message-bubble.stories.tsx`](./src/components/stories/chat-message-bubble.stories.tsx) | Role-driven layout shell (`user`, `agent`, `system`, `tool`, `diff`). `role` prop shadows the native ARIA `role`. | -| `ToolCallCard` · `ToolCallCardProps` · `ToolCallStatus` | [`tool-call-card.stories.tsx`](./src/components/stories/tool-call-card.stories.tsx) | Tool-call framing card. Status → tone: `running → accent`, `done → success`, `error → danger`. | -| `CodeBlock` · `CodeBlockProps` | [`code-block.stories.tsx`](./src/components/stories/code-block.stories.tsx) | Canvas-deep container with accent `$ ` prompt and ghost copy → 1.5s check swap. | +| Export | Story | Notes | +| --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | +| `ChatMessageBubble` · `ChatMessageBubbleProps` · `ChatMessageRole` · `ChatMessageAlign` | [`chat-message-bubble.stories.tsx`](./src/components/stories/chat-message-bubble.stories.tsx) | Role-driven layout shell (`user`, `agent`, `system`, `tool`, `diff`). `role` prop shadows the native ARIA `role`. | +| `ToolCallCard` · `ToolCallCardProps` · `ToolCallStatus` | [`tool-call-card.stories.tsx`](./src/components/stories/tool-call-card.stories.tsx) | Tool-call framing card. Status → tone: `running → accent`, `done → success`, `error → danger`. | +| `WireCard` · `WireCardHead` · `WireCardBody` · `WireCardFoot` · `WireCardProps` | [`wire-card.stories.tsx`](./src/components/stories/wire-card.stories.tsx) | Bordered wire-protocol card (recipes, receipts, capabilities) with head/body/foot regions. `inline` prop renders an inline strip. | +| `TypingDots` · `TypingDotsProps` | [`typing-dots.stories.tsx`](./src/components/stories/typing-dots.stories.tsx) | Three-dot typing indicator. Relies on the `typing-bounce` keyframes in `tokens.css`. | +| `CodeBlock` · `CodeBlockProps` | [`code-block.stories.tsx`](./src/components/stories/code-block.stories.tsx) | Canvas-deep container with accent `$ ` prompt and ghost copy → 1.5s check swap. | ## UIProvider wiring diff --git a/packages/ui/src/components/kind-chip.test.tsx b/packages/ui/src/components/kind-chip.test.tsx index 5ea8ccd55..bfe6b6cb2 100644 --- a/packages/ui/src/components/kind-chip.test.tsx +++ b/packages/ui/src/components/kind-chip.test.tsx @@ -1,30 +1,48 @@ import { render } from "@testing-library/react"; import { describe, expect, it } from "vitest"; -import { KindChip } from "./kind-chip"; +import { KindChip, KIND_DOT_COLORS } from "./kind-chip"; describe("KindChip", () => { - it("Should render the original kind text with lowercase mono accent styling", () => { - const { container } = render(); + it("Should render the kind label uppercase with the wire-dot chrome", () => { + const { container } = render(); const chip = container.querySelector('[data-slot="kind-chip"]'); expect(chip).not.toBeNull(); - expect(chip?.textContent).toBe("Greet"); + expect(chip?.textContent).toBe("greet"); expect(chip?.className).toContain("font-mono"); - expect(chip?.className).toContain("lowercase"); - expect(chip?.className).toContain("rounded-[var(--radius-chip)]"); - expect(chip?.className).toContain("bg-[color:var(--color-accent-tint)]"); - expect(chip?.className).toContain("text-[color:var(--color-accent)]"); - expect(chip?.getAttribute("data-kind")).toBe("Greet"); + expect(chip?.className).toContain("uppercase"); + expect(chip?.className).toContain("border-[color:var(--color-divider)]"); + expect(chip?.className).toContain("bg-transparent"); + expect(chip?.className).toContain("text-[color:var(--color-text-tertiary)]"); + expect(chip?.getAttribute("data-kind")).toBe("greet"); }); - it("Should forward the provided className alongside the defaults", () => { + it("Should render a colored 7px dot for known protocol kinds", () => { + const { container } = render(); + const dot = container.querySelector('[data-slot="kind-chip-dot"]'); + expect(dot).not.toBeNull(); + expect(dot).toHaveStyle({ background: KIND_DOT_COLORS.receipt }); + }); + + it("Should omit the dot for unknown kinds (platforms, event ids)", () => { + const { container } = render(); + expect(container.querySelector('[data-slot="kind-chip-dot"]')).toBeNull(); + }); + + it("Should display the explicit label when provided", () => { + const { container } = render(); + const chip = container.querySelector('[data-slot="kind-chip"]'); + expect(chip?.textContent).toBe("presence"); + }); + + it("Should forward className alongside the defaults", () => { const { container } = render(); const chip = container.querySelector('[data-slot="kind-chip"]'); expect(chip?.className).toContain("custom-class"); - expect(chip?.className).toContain("bg-[color:var(--color-accent-tint)]"); + expect(chip?.className).toContain("border-[color:var(--color-divider)]"); }); - it("Should preserve internal data markers when conflicting data attributes are passed", () => { + it("Should preserve internal data markers when conflicting attributes are passed", () => { const { container } = render( ); diff --git a/packages/ui/src/components/kind-chip.tsx b/packages/ui/src/components/kind-chip.tsx index 15a05938a..feb2db259 100644 --- a/packages/ui/src/components/kind-chip.tsx +++ b/packages/ui/src/components/kind-chip.tsx @@ -6,28 +6,51 @@ import { cn } from "../lib/utils"; export interface KindChipProps extends Omit, "children"> { kind: string; + /** Optional explicit label; defaults to `kind`. */ + label?: React.ReactNode; } /** - * Protocol kind marker (e.g. `greet`, `whois`, `say`, `direct`, `capability`). - * 5px radius, lowercase mono, accent-tint background with accent text — per DESIGN.md §4 "Kind Chip". + * Protocol kind marker — mirrors `.intent-badge` + `.wire-dot` in + * `docs/design/web-inspiration/styles/app.css`. Transparent surface, neutral + * border + tertiary label, leading 7px colored dot keyed off the protocol + * kind. Unknown kinds (platform names, event ids) render without a dot. */ -function KindChip({ kind, className, ...props }: KindChipProps) { +const KIND_DOT_COLORS: Record = { + say: "#8E8E93", + greet: "#5BA6FF", + direct: "var(--color-accent)", + receipt: "var(--color-success)", + recipe: "var(--color-warning)", + trace: "#B892FF", + whois: "#4FD1C5", +}; + +function KindChip({ kind, label, className, ...props }: KindChipProps) { + const dotColor = KIND_DOT_COLORS[kind.toLowerCase()]; + return ( - {kind} + {dotColor ? ( + ); } -export { KindChip }; +export { KindChip, KIND_DOT_COLORS }; diff --git a/packages/ui/src/components/mono-badge.test.tsx b/packages/ui/src/components/mono-badge.test.tsx index b2469f7a1..a61cc88f3 100644 --- a/packages/ui/src/components/mono-badge.test.tsx +++ b/packages/ui/src/components/mono-badge.test.tsx @@ -53,6 +53,11 @@ describe("MonoBadge", () => { background: "bg-[color:var(--color-neutral-tint)]", text: "text-[color:var(--color-text-label)]", }, + { + tone: "solid-accent", + background: "bg-[color:var(--color-accent)]", + text: "text-[color:var(--color-accent-ink)]", + }, ])("Should apply the $tone tint tokens", ({ tone, background, text }) => { const { container } = render(token); const badge = container.querySelector('[data-slot="mono-badge"]'); diff --git a/packages/ui/src/components/mono-badge.tsx b/packages/ui/src/components/mono-badge.tsx index 805662247..d7b9be7b5 100644 --- a/packages/ui/src/components/mono-badge.tsx +++ b/packages/ui/src/components/mono-badge.tsx @@ -17,6 +17,7 @@ const monoBadgeVariants = cva( "border border-[color:var(--color-divider)] bg-transparent text-[color:var(--color-text-label)]", neutral: "bg-[color:var(--color-neutral-tint)] text-[color:var(--color-text-label)]", accent: "bg-[color:var(--color-accent-tint)] text-[color:var(--color-accent)]", + "solid-accent": "bg-[color:var(--color-accent)] text-[color:var(--color-accent-ink)]", success: "bg-[color:var(--color-success-tint)] text-[color:var(--color-success)]", warning: "bg-[color:var(--color-warning-tint)] text-[color:var(--color-warning)]", danger: "bg-[color:var(--color-danger-tint)] text-[color:var(--color-danger)]", @@ -43,7 +44,8 @@ export interface MonoBadgeProps /** * Inline mono pill for identifiers (agent IDs, versions, protocol names) and - * status badges. Uppercase by default, tinted via the DESIGN.md §4 tint formula. + * status badges. Uppercase by default, with semantic tones using the DESIGN.md + * §4 tint formula and `solid-accent` reserved for accent-filled emphasis. */ function MonoBadge({ tone, uppercase, className, ...props }: MonoBadgeProps) { const dataSlot = props["data-slot"] ?? "mono-badge"; diff --git a/packages/ui/src/components/mono-chip.test.tsx b/packages/ui/src/components/mono-chip.test.tsx new file mode 100644 index 000000000..43d52b970 --- /dev/null +++ b/packages/ui/src/components/mono-chip.test.tsx @@ -0,0 +1,22 @@ +import { render } from "@testing-library/react"; +import { describe, expect, it } from "vitest"; + +import { MonoChip } from "./mono-chip"; + +describe("MonoChip", () => { + it("Should render a neutral mono chip with elevated surface", () => { + const { container } = render(code); + const chip = container.querySelector('[data-slot="mono-chip"]'); + expect(chip).not.toBeNull(); + expect(chip?.textContent).toBe("code"); + expect(chip?.className).toContain("font-mono"); + expect(chip?.className).toContain("bg-[color:var(--color-surface-elevated)]"); + expect(chip?.className).toContain("text-[color:var(--color-text-secondary)]"); + }); + + it("Should forward className", () => { + const { container } = render(tag); + const chip = container.querySelector('[data-slot="mono-chip"]'); + expect(chip?.className).toContain("custom-class"); + }); +}); diff --git a/packages/ui/src/components/mono-chip.tsx b/packages/ui/src/components/mono-chip.tsx new file mode 100644 index 000000000..0bbd3908c --- /dev/null +++ b/packages/ui/src/components/mono-chip.tsx @@ -0,0 +1,29 @@ +"use client"; + +import * as React from "react"; + +import { cn } from "../lib/utils"; + +export interface MonoChipProps extends React.ComponentProps<"span"> {} + +/** + * Neutral inline chip — mirrors `.mono-chip` (default tone) in + * `docs/design/web-inspiration/styles/app.css`. Used for capability + * descriptors, tag rows, and other identifier strings rendered alongside + * message bodies. For tinted semantic variants use {@link MonoBadge}. + */ +function MonoChip({ className, ...props }: MonoChipProps) { + return ( + + ); +} + +export { MonoChip }; diff --git a/packages/ui/src/components/pills.tsx b/packages/ui/src/components/pills.tsx index 80a2bcc21..6d663e66f 100644 --- a/packages/ui/src/components/pills.tsx +++ b/packages/ui/src/components/pills.tsx @@ -7,9 +7,9 @@ import { cn } from "../lib/utils"; /** * Pill = static semantic tag rendered as a span. - * Pills = segmented toggle group — follows the mock at `docs/design/web-inspiration/src/primitives.jsx`. - * Both live in the same file because Pills renders pill-styled buttons, and keeping them - * colocated avoids duplicating the variant table. + * Pills = segmented toggle group — mirrors `.pills` + `.pill` in + * `docs/design/web-inspiration/styles/app.css`. The segments live inside a + * contained track (panel surface, 1px divider border, 3px inner padding). */ const pillBase = @@ -60,17 +60,17 @@ function Pill({ className, variant, size, ...props }: PillProps) { } const pillToggleVariants = cva( - `${pillBase} cursor-pointer border bg-transparent text-[color:var(--color-text-secondary)] hover:text-[color:var(--color-text-primary)] disabled:cursor-not-allowed disabled:opacity-50`, + "inline-flex items-center justify-center gap-1.5 whitespace-nowrap font-mono uppercase font-semibold tracking-[0.08em] transition-colors duration-150 ease-out cursor-pointer focus-visible:outline-none focus-visible:ring-1 focus-visible:ring-[color:var(--color-accent)] focus-visible:ring-offset-0 disabled:cursor-not-allowed disabled:opacity-50", { variants: { active: { - true: "border-[color:var(--color-accent)] bg-[color:var(--color-accent)] text-white hover:text-white", + true: "bg-[color:var(--color-surface-elevated)] text-[color:var(--color-text-primary)]", false: - "border-[color:var(--color-divider)] hover:border-[color:var(--color-text-label)] hover:bg-[color:var(--color-hover)]", + "bg-transparent text-[color:var(--color-text-tertiary)] hover:text-[color:var(--color-text-secondary)]", }, size: { - sm: "h-[22px] rounded-[var(--radius-mono-badge)] px-2 text-[10px] font-semibold tracking-[0.08em]", - md: "h-8 rounded-[var(--radius-xl,20px)] px-3.5 text-[11px] font-semibold tracking-[0.12em]", + sm: "h-[20px] rounded-[5px] px-2 text-[10px]", + md: "h-[22px] rounded-[5px] px-2.5 text-[10px]", }, }, defaultVariants: { @@ -111,7 +111,10 @@ function Pills({
    {items.map(item => { @@ -136,12 +139,7 @@ function Pills({ {typeof item.badge === "number" && item.badge > 0 ? ( {item.badge} diff --git a/packages/ui/src/components/search-input.tsx b/packages/ui/src/components/search-input.tsx index 029151b3d..8ef5cf094 100644 --- a/packages/ui/src/components/search-input.tsx +++ b/packages/ui/src/components/search-input.tsx @@ -17,8 +17,9 @@ export interface SearchInputProps extends Omit< } /** - * Search field — matches `SearchInput` in `docs/design/web-inspiration/src/primitives.jsx`. - * Standard 36px row, search glyph on the left, optional kbd hint on the right. + * Search field — mirrors `.search-input` in + * `docs/design/web-inspiration/styles/app.css`. Compact 28px row, panel-tone + * surface, soft tertiary focus border (no accent ring), bordered kbd hint. */ function SearchInput({ value, @@ -37,14 +38,14 @@ function SearchInput({ data-slot="search-input" data-disabled={disabled ? "true" : undefined} className={cn( - "flex h-9 min-w-0 items-center gap-2 rounded-lg border border-[color:var(--color-divider)] bg-[color:var(--color-surface-elevated)] px-3 text-[13px] text-[color:var(--color-text-primary)] transition-colors focus-within:border-[color:var(--color-accent)] focus-within:ring-1 focus-within:ring-[color:var(--color-accent)]", + "flex h-[28px] min-w-0 items-center gap-2 rounded-[7px] border border-[color:var(--color-divider)] bg-[color:var(--color-surface-panel)] px-2 text-[13px] text-[color:var(--color-text-primary)] transition-colors focus-within:border-[color:var(--color-text-tertiary)]", "data-[disabled=true]:cursor-not-allowed data-[disabled=true]:opacity-60", containerClassName )} >