Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions .agents/skills/agh-code-guidelines/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
name: agh-code-guidelines
description: >-
Enforces AGH Go code style and concurrency patterns before writing or editing
any production Go file: error wrapping with %w, errors.Is/As only (no
strings.Contains on err.Error), no underscore-discarded errors, slog over
log/fmt, context.Context as first arg, compile-time interface assertions, no
hardcoded config, CLI flag presence detection, whitespace normalization at
CLI boundary, no comments restating WHAT, goroutine ownership and shutdown
via context, no fire-and-forget, no time.Sleep in orchestration. Use whenever
creating or modifying any *.go file under cmd/ or internal/ that is not a
test file. Do not use for *_test.go (use agh-test-conventions), schema
migrations (use agh-schema-migration), or contract changes (use
agh-contract-codegen-coship).
trigger: implicit
---

# AGH Code Guidelines

These are the AGH-specific Go style and concurrency rules. They exist because reviewers will block PRs that violate them, and most violations are caught by lint/CI only after the fact. Activate this skill before writing or editing production Go code so the patterns land correctly the first time.

Companion skills cover narrower domains: `agh-test-conventions` for tests, `agh-cleanup-failure-paths` for multi-step error returns, `agh-schema-migration` for SQLite changes, `agh-contract-codegen-coship` for contract/OpenAPI edits, `golang-pro` for general Go idiom guidance. Activate those alongside when their domain applies.

## Procedures

**Step 1: Identify the Edit Surface**

1. Confirm the target is a production Go file (`cmd/**` or `internal/**`, not `*_test.go`).
2. Read `references/coding-style.md` for the canonical style rules.
3. Read `references/concurrency-patterns.md` for the canonical concurrency rules.

**Step 2: Apply Error Discipline**

1. Wrap every error with context: `fmt.Errorf("operation: %w", err)`. The `%w` verb is mandatory when the caller may need to match the cause.
2. Match errors with `errors.Is` and `errors.As` exclusively. `strings.Contains(err.Error(), ...)` is a blocking violation — replace with sentinel errors or typed errors.
3. Never ignore an error with `_`. Either handle it or write a one-line justification comment explaining why the error is impossible or irrelevant.
4. No `panic()` or `log.Fatal()` in production paths. The only legitimate use is unrecoverable startup failure in `main`.

**Step 3: Apply Logging and Context Discipline**

1. Use `log/slog` for every operational log line. `log.Printf`, `fmt.Println`, `fmt.Printf` are forbidden in production paths.
2. Pass `context.Context` as the first argument to any function that crosses a runtime boundary (HTTP handler, UDS handler, DB call, subprocess spawn, network call).
3. Never call `context.Background()` outside `main` or a focused test. Caller-supplied context is the rule.
4. External HTTP calls require an explicit timeout. `http.DefaultClient` is forbidden (also enforced by `agh-cleanup-failure-paths`).

**Step 4: Apply Type Discipline**

1. Every new exported type that satisfies an interface gets a compile-time assertion: `var _ Interface = (*Type)(nil)` adjacent to the type definition. Reviewers will block missing assertions.
2. Replace `interface{}` / `any` with the concrete type whenever the type is known statically.
3. No reflection without a written performance justification.
4. No defensive nil-checks after `make(...)`. Lint flags `if x == nil` after `make` as unreachable.

**Step 5: Apply Configuration Discipline**

1. Never hardcode operational values. Pull from TOML config (`internal/config`) or expose via functional options (`NewManager(opts ...Option)`).
2. Disable / zero-value semantics must be explicit — document whether `0` means "off" or "use default".
3. Resolution chains (e.g., env → flag → config → default) are documented in code as ordered fallbacks ending in an actionable error.
4. Config lifecycle is part of feature lifecycle: any feature that adds/changes/removes config updates the struct, defaults, validation, examples, `config.toml` docs, and tests in the same change.

**Step 6: Apply CLI Boundary Discipline**

1. Distinguish "flag not set" from "flag set to zero value" via `cmd.Flags().Changed(name)` (Cobra) or equivalent. Silently ignoring an explicit flag is a bug.
2. Trim and drop empty entries from string-slice CLI inputs (capabilities, IDs, tags, paths) before sending to the daemon. Whitespace-only strings must not surface as "validation problems".
3. Stable `-o json` / `-o jsonl` are compatibility contracts — do not change their shape without a contract update.

**Step 7: Apply Concurrency Discipline**

1. Every goroutine has explicit ownership and shutdown via `context.Context` cancellation.
2. No fire-and-forget goroutines. Track with `sync.WaitGroup` (or equivalent owner-side primitive) and join on shutdown.
3. Long-running loops use `select { case <-ctx.Done(): return; case ... }`.
4. Prefer channels over shared memory with mutexes when practical. `sync.RWMutex` for read-heavy state, `sync.Mutex` for write-heavy.
5. No `time.Sleep()` in orchestration paths — use timers, tickers, or context deadlines.
6. Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by a Manager-owned WaitGroup and joined in Manager shutdown. Never put goroutine-owned channels in a struct field that another goroutine mutates — use a per-run handle.
7. Subprocess managed-stop respects `ctx.Done()` between Shutdown and Wait. Wrap `proc.Wait()` in `select { case <-proc.Done(): case <-ctx.Done(): }`. Process-group signaling helpers live in `internal/procutil`.

**Step 8: Apply Comment Discipline**

1. Default to writing no comments. Identifiers carry the WHAT.
2. Comments capture WHY when non-obvious: hidden constraints, invariants, workarounds for a specific bug, behavior that would surprise a reader.
3. Never reference the current task, fix, callers, or issue number in a comment ("used by X", "added for the Y flow", "handles the case from issue #123"). Those rot — they belong in the PR description.
4. No multi-paragraph docstrings. One short line max.

**Step 9: Pre-Commit Validation**

1. Run `make lint` for the affected package — zero tolerance for golangci-lint findings.
2. Run `make verify` (fmt → lint → test → boundaries → build) before declaring the edit complete.
3. For race-sensitive packages (`internal/session`, `internal/acp`, `internal/hooks`, `internal/subprocess`, `internal/resources`), reproduce CI locally with `act workflow_dispatch -W .github/workflows/ci.yml -j verify --container-architecture linux/amd64` before claiming success.

## Error Handling

- **Existing file already violates the rules:** fix what the current edit touches; flag the rest as pre-existing tech debt in the task body. Do not silently expand scope.
- **`errors.Is` / `errors.As` is impossible because the dependency returns a string:** wrap once at the boundary in a typed error of yours; downstream code matches on your typed error.
- **Reflection genuinely required (codegen, decoder):** keep a written justification adjacent to the reflection call. Lint exception requires a `//nolint:` directive with a reason.
- **`panic` shows up in seemingly-production code:** confirm whether the path is reachable post-`main`. If it is, replace with explicit error return; if it is genuinely unreachable, mark with `// unreachable: ...` and prefer `panic("invariant: ...")` over `log.Fatal`.
- **CLI command silently ignores a flag:** verify with `cmd.Flags().Changed(name)`; if the flag is meaningfully optional, document the resolution chain and emit an explicit `slog` debug line when the default is taken.
71 changes: 71 additions & 0 deletions .agents/skills/agh-code-guidelines/references/coding-style.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# AGH Coding Style — Canonical Rules

Verbatim canonical rules. Reviewers will quote these.

## Errors

- Wrap with context using `%w`: `fmt.Errorf("operation: %w", err)`.
- Match with `errors.Is` and `errors.As`. **`strings.Contains(err.Error(), …)` is forbidden.**
- Never ignore an error with `_`. Every error is handled or has a written justification on the line.
- No `panic()` or `log.Fatal()` in production paths — only for unrecoverable startup failures inside `main`.

## Cleanup

- Pair `defer cancel()` immediately after `WithCancel` / `WithTimeout` / `WithDeadline`.
- Every error-return path that previously created or extended a context, registered a resource, opened a connection, or spawned a subprocess MUST `cancel()`, `Close()`, `Stop()`, or release its lease before returning. See `agh-cleanup-failure-paths` for the full audit pattern.

## Logging

- `log/slog` for structured logging. `log.Printf`, `fmt.Println`, `fmt.Printf` are forbidden in operational paths.
- Include correlation keys when relevant: `workspace_id`, `session_id`, `parent_session_id`, `root_session_id`, `agent_name`, `task_id`, `run_id`, `claim_token_hash`, `lease_until`, `workflow_id`, `coordinator_session_id`, `scheduler_reason`, `hook_event`, `hook_name`, `spawn_depth`, `actor_kind`, `actor_id`, `release_reason`.

## Context

- `context.Context` is the first argument of any function that crosses a runtime boundary.
- `context.Background()` is forbidden outside `main` and focused tests.
- Detached execution (work that outlives the request) uses `context.WithoutCancel(ctx)`. `WithoutCancel` does NOT preserve deadlines — re-attach with `WithDeadline` if needed.

## Types and Interfaces

- New exported types implementing an interface get `var _ Interface = (*Type)(nil)` adjacent to the type. **Mandatory.**
- No `interface{}` / `any` when a concrete type is known.
- No reflection without a written performance justification.
- No defensive `if x == nil` checks after `make(...)`. Lint flags this as unreachable.

## Configuration

- Never hardcode operational values. Use TOML config (`internal/config`) or functional options (`NewManager(opts ...Option)`).
- Disable / zero-value semantics must be explicit. Document whether `0` means "off" or "use default".
- Resolution chains (env → flag → config → default) are documented as ordered fallbacks ending in actionable errors.
- Config lifecycle is part of the feature lifecycle: structs, defaults, merge/overlay behavior, validation, examples, `config.toml` docs, generated CLI/site docs, and tests update in the same change. If no config change is needed, the TechSpec says why explicitly.

## CLI Boundary

- Distinguish "flag not set" from "flag set to zero value" via `cmd.Flags().Changed(name)` (Cobra) or equivalent. Silently ignoring an explicit flag is a bug.
- String-slice inputs (capabilities, IDs, tags, paths) trim and drop empty entries before sending to the daemon. Whitespace-only strings must not be pushed as "validation problems".
- `-o json` and `-o jsonl` are compatibility contracts. No command aliases (no `done`, no `pass`).
- Operator endpoints MUST NOT infer agent identity from environment variables — that path belongs to `internal/agentidentity` for agent-facing CLI.

## Comments

- Default: write no comments. Well-named identifiers carry the WHAT.
- Comments capture WHY when non-obvious: hidden constraints, invariants, workarounds for a specific bug, surprising behavior.
- Never reference the current task, fix, callers, or issue number ("used by X", "added for Y flow", "handles the case from issue #123"). Those rot.
- No multi-paragraph docstrings or multi-line comment blocks. One short line max.

## Outbound Calls

- `http.DefaultClient` is forbidden in production paths.
- Every outbound HTTP/network call uses a client with an explicit timeout.
- Drain response bodies (`io.Copy(io.Discard, resp.Body)` then `resp.Body.Close()`) — do not skip the drain.

## Architecture Discipline (cross-package)

- Interfaces defined where consumed (Go-style): `session/` defines `AgentDriver`, `acp/` implements it.
- Direct function calls through interfaces. No event bus, no NATS, no reflection-based routing.
- No back-pointers between packages — inject callbacks or interfaces.
- Functional options for constructors: `NewManager(opts ...Option)`.
- Maps for <10 items — no registry interfaces for small collections.
- File-level organization within packages — sub-packages only when complexity justifies it.
- `internal/api/core` is the canonical handler home. REST/UDS endpoints exist as shared `BaseHandlers` methods; HTTP and UDS only choose registration and authentication. No transport-duplicated parsing/validation.
- New `internal/api/*` subpackage requires updating `magefile.go` `Boundaries()` in the same commit (CI-enforceable boundaries prevent import cycles).
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# AGH Concurrency Patterns — Canonical Rules

Verbatim canonical rules. Reviewers will quote these. Companion skills cover deeper analysis: `deadlock-finder-and-fixer` for race/deadlock investigation, `agh-cleanup-failure-paths` for error-path cancellation discipline.

## Goroutine Ownership

- Every goroutine has explicit ownership and shutdown via `context.Context` cancellation.
- No fire-and-forget goroutines. Track with `sync.WaitGroup` or equivalent owner-side primitive and join on shutdown.
- Long-running loops use `select { case <-ctx.Done(): return; case ... }`. Never busy-wait.
- Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by a Manager-owned WaitGroup and joined in Manager shutdown.
- Never put goroutine-owned channels in a struct field that another goroutine mutates — use a per-run handle.

## Synchronization

- Prefer channels over shared memory with mutexes when practical.
- `sync.RWMutex` for read-heavy shared state, `sync.Mutex` for write-heavy.
- No `time.Sleep()` in orchestration. Use timers, tickers, or context deadlines.

## Detached Execution

- Any work that outlives an HTTP/UDS request — prompts, network channel sends, automation jobs — MUST detach via `context.WithoutCancel(ctx)`.
- Never tie execution lifetime to request lifetime.
- Expose explicit cancel endpoints (e.g., `POST /api/sessions/:id/prompt/cancel`).
- `context.WithoutCancel` does NOT preserve deadlines. Re-attach with `WithDeadline` if needed.
- The writer loop stays bound to the request context — detach the *execution*, not the *response*.

## Subprocess Supervision

- Subprocess managed-stop respects `ctx.Done()` between Shutdown and Wait. Wrap `proc.Wait()` in `select { case <-proc.Done(): case <-ctx.Done(): }`.
- Process-group supervision parity: Unix uses process groups, Windows uses forced-exit fallback. Always cross-build with `GOOS=windows GOARCH=amd64 go build` before claiming subprocess work complete.
- Centralize signaling helpers in `internal/procutil`. Do not reinvent process-group signaling per package.

## Race / cgo

- `make verify` runs `-race`. Race-enabled tests need `CGO_ENABLED=1`.
- `runRaceEnabledGoCommand` (or equivalent) clones caller env and forces `CGO_ENABLED=1` for race subprocesses. Do not trust ambient env.
- Before claiming `make verify` complete on race-sensitive packages (`internal/session`, `internal/acp`, `internal/hooks`, `internal/subprocess`, `internal/resources`), reproduce locally with `act workflow_dispatch -W .github/workflows/ci.yml -j verify --container-architecture linux/amd64`.

## Authoritative Primitives (do not replicate)

- `task.Service.ClaimNextRun` is the canonical claim primitive — no peer package may replicate it.
- Wake / observe / sweep are allowed; claim / own is not.
- The mechanical scheduler does NOT call `ClaimNextRun` directly in MVP.
- Hooks dispatch at the call site that owns the state transition. Never tail event/log tables to fire hooks.

## Common Failure Modes

- Goroutine leak on error path: every error return that ran above a `go func()` spawn must signal that goroutine to exit (via `cancel()` or close of an owner-controlled channel).
- Deadlock on shutdown: a goroutine reading from a channel that the owner stopped writing to without closing — close channels you own when shutting down.
- Race on map / slice mutation: take the appropriate mutex, or use `sync.Map` for genuinely concurrent maps. Concurrent slice append without sync is always a bug.
- Lost cancellation: storing `context.Background()` instead of caller-supplied `ctx` breaks deadline propagation. Always thread `ctx` through.
14 changes: 13 additions & 1 deletion .agents/skills/agh-test-conventions/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,23 @@ trigger: implicit
3. Deterministic time: replace `time.Now()` with injected clocks; deterministic IDs use injected ID generators.
4. For new types satisfying interfaces, ensure `var _ Interface = (*Type)(nil)` exists in production code (not in the test file).

**Step 6: Pre-Commit Validation**
**Step 6: Apply Integration / E2E Discipline**

1. Integration tests live in `*_integration_test.go` with `//go:build integration` at the top. Co-locate them with the package they test — never in a separate `test/` directory.
2. `make test` runs unit only. `make test-integration` adds the `+integration` build tag. `make test-e2e-runtime` is the daemon-side Go harness; `make test-e2e-web` is the browser-side Playwright harness.
3. Use `TestMain` for expensive one-time setup/teardown.
4. Use real dependencies — real SQLite via `t.TempDir()`, mock ACP server as a subprocess (`acpmock`). Avoid in-process fakes when a real subprocess can be wired.
5. Keep integration tests fast enough for CI: ~30s max per package.
6. **E2E tests are part of the runtime contract.** When a runtime contract changes (prompt augmenter, situation context, fixture format), the E2E mock and matchers ship in the same PR. Otherwise tests pass against a stale prompt and fail later.
7. Read `references/test-shape-rules.md` "Integration / E2E" section for additional patterns.

**Step 7: Pre-Commit Validation**

1. Run `python3 scripts/check-test-conventions.py <file_path>` to scan the test file for violations. The script is a regex-based fast check; it complements `make verify`.
2. If the script reports violations, fix them before running `make verify`.
3. After edits, run `go test ./<package> -count=1 -race` for the affected package, then `make verify`.
4. **`make verify` is the commit gate.** If verification is blocked by an external/branch-side asset issue (missing test fixture, etc.), do NOT commit — report the verified blocker and hold.
5. **Test failures are production bugs.** Fix production code; do not weaken assertions. The only legitimate exception is documenting an INVALID review item with concrete evidence.

## Error Handling

Expand Down
Loading
Loading