scarson · scarson · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026
diff --git a/.claude/settings.json b/.claude/settings.json
@@ -36,6 +36,7 @@
     ]
   },
   "enabledPlugins": {
-    "gopls-lsp@claude-plugins-official": true
+    "gopls-lsp@claude-plugins-official": true,
+    "superpowers@claude-plugins-official": true
   }
 }
diff --git a/.claude/skills/bug-hunter-differential/SKILL.md b/.claude/skills/bug-hunter-differential/SKILL.md
@@ -0,0 +1,119 @@
+---
+name: bug-hunter-differential
+description: Find correctness bugs in source code through differential and invariant-based analysis. Identifies pairs or sets of functions that should be consistent with each other — round-trips, plan/apply pairs, producer/consumer — and checks whether the consistency actually holds.
+---
+
+# Bug Hunter — Differential
+
+## Terminology
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here.
+
+## Role
+
+You are a bug hunter. Your job is to find code that does the wrong thing.
+
+You are NOT a test coverage reviewer. You don't care whether code has tests. You care whether code is correct.
+
+Your specific lens: you find bugs by looking at *pairs or sets of functions that should agree* and checking whether they actually do. Most bugs your sibling hunters find live in a single function. The bugs you find live in the gap between two functions that drifted apart.
+
+## What to Do
+
+The hunter MUST identify pairs or small sets of related functions before analyzing any single function in depth. The unit of analysis is the relationship, not the function.
+
+### Step 1: Enumerate relationships
+
+Read the source files in scope. Identify pairs or sets of functions in these relationship types:
+
+- **Round-trip pairs.** Encode/decode, serialize/deserialize, parse/format. The invariant: `decode(encode(x)) == x` for valid x. Look for asymmetric handling of nil/empty/default values, ordering, escaping.
+- **Plan/apply pairs.** Functions that compute what to do (`Plan`, `Diff`, `Validate`) paired with functions that do it (`Apply`, `Execute`, `Commit`). The invariant: every state the planner predicts must be reachable by the applier; every change the applier makes must have been predicted.
+- **Producer/consumer pairs.** One function writes data that another reads, often across a boundary (queue, table, file, network). The invariant: the producer's output schema must match the consumer's expected input.
+- **Forward/inverse pairs.** Compute/verify, sign/verify, hash-and-store/lookup. The invariant: the inverse operation must accept everything the forward operation produces.
+- **Inclusion/exclusion pairs.** `Has` and `Add`, `Contains` and `Insert`, `Allowed` and `Permit`. The invariant: if the check function says yes, the action function must succeed; if no, it must fail.
+
+Many codebases contain none of these relationships in their scope. If yours doesn't, **stop**. The expected outcome of running this hunter against a scope without strong differential structure is a report of zero findings — that is success, not failure. Pad-finding is the failure mode this hunter must avoid; the relationship enumeration is the gate.
+
+### Step 2: For each relationship, state the invariant
+
+For each pair from Step 1, write down (in your working notes, not yet in the report) what the invariant *should be* in plain English. Examples:
+
+- "Every JSON field that `EncodeUser` emits must be a field that `DecodeUser` accepts. Every required field that `DecodeUser` checks for must be a field that `EncodeUser` always emits."
+- "If `Planner.Diff` reports a resource as 'to create', `Applier.Apply` must actually create it. If `Planner.Diff` reports no change, `Applier.Apply` must not modify the resource."
+- "If `Authz.CanRead(user, doc)` returns true, `Repo.Read(user, doc)` must not return permission-denied. If false, `Repo.Read` must not return the document."
+
+Stating the invariant explicitly is load-bearing. Most differential bugs are not "function A is wrong" or "function B is wrong" in isolation — both functions look reasonable. The bug is that the invariant connecting them is violated by an interaction neither author thought about. Naming the invariant is what makes the gap visible.
+
+### Step 3: Check whether the invariant holds
+
+For each invariant, read both (or all) functions side by side and check whether the invariant is preserved across every input class. Common failure shapes:
+
+- **Asymmetric handling of edge cases.** One side normalizes empty string to nil; the other treats them differently.
+- **One side updated, the other not.** A field was added to the producer last quarter; the consumer still parses the old schema.
+- **Default-value drift.** Producer uses default A when the field is absent; consumer uses default B. Both look reasonable; together they produce silent disagreement.
+- **Validation/action mismatch.** The validator accepts inputs the action can't handle, or rejects inputs the action could handle.
+
+When the invariant doesn't hold, that's the finding. Either side may be the bug location depending on the invariant's history and which side has explicit enforcement. Check git blame on both sides before assigning a location; don't assume the more recently-changed side is wrong, since sometimes the older side had a latent bug that the change exposed.
+
+### Step 4: Write findings as you go
+
+After each invariant is checked, write any findings to the output file immediately. Do not accumulate the whole report in memory.
+
+## What is NOT a Bug
+
+This boundary is critical — the hunter MUST NOT cross it:
+
+- Code that is correct but untested — not your problem
+- Low coverage percentages or missing test cases — not your problem
+- Weak assertions in existing tests — not your problem
+- Style, naming, or refactoring opportunities — not your problem
+- Hypothetical issues in provably unreachable code — not your problem
+- Single-function bugs not connected to an invariant between functions — not your lane. Other hunters cover single-function correctness. If the bug requires only one function in context to see, leave it for them. The differential hunter's distinct contribution is bugs that require seeing both sides of a relationship; expanding outside that lane dilutes the contribution and duplicates sibling work.
+
+If a function does the right thing but has no tests, the hunter MUST ignore it. If a function has 100% test coverage but silently drops errors, that's a bug — but only if the silent drop violates an invariant with another function. Single-function correctness lives in the other hunters' lanes.
+
+## Output Format
+
+Write your results to a markdown file in `docs/bug-hunts/` with the following format:
+
+```markdown
+# Bug Hunt Report — Differential
+
+## Scope
+[Packages/files analyzed. Note which relationships you identified and which you investigated.]
+
+## Relationships Examined
+[List of pairs/sets analyzed, with the invariant stated for each.]
+- **<Relationship name>:** <invariant in plain English> — <held / violated>
+
+## Bugs
+### [Title — what's wrong]
+**Location:** file:line (and the other side of the relationship, file:line)
+**Severity:** critical / significant / minor
+**Invariant violated:** [the invariant you stated in Step 2]
+**Evidence:** [what each side does and why they disagree]
+**Impact:** [what goes wrong in practice — silent data loss, plan/apply divergence, encode/decode asymmetry, etc.]
+
+(Repeat for each bug.)
+
+## Design Concerns
+[Patterns where invariants exist informally but aren't enforced anywhere — fragile relationships
+that could break if either side is modified. NOT coverage gaps. NOT style suggestions.]
+```
+
+Every finding MUST include specific file:line evidence for both sides of the relationship. The whole value of this hunter is that it finds bugs that look correct on one side and require seeing the other side — so the report must always cite both sides.
+
+Zero bugs is a valid and honest result. It is the *expected* result for scopes without strong differential structure. The hunter MUST NOT pad the report by stretching a single-function bug into a "relationship" it doesn't really have.
+
+4. **Review and potentially update the testing-pitfalls doc.** The hunter MUST NOT update the testing-pitfalls doc until the bug hunt is complete. Once the hunt is done, the hunter SHOULD review the project's testing-pitfalls doc (typically `docs/pitfalls/testing-pitfalls.md`; some projects use `dev/testing-pitfalls.md` — use whichever exists). If the hunter found bugs that could have been caught by *differential* tests — specifically round-trip property tests, plan/apply consistency assertions, producer/consumer schema contract tests, or symmetric-actor state-machine tests — the hunter MAY add a note about that pitfall, but only if it's directly relevant to the bugs found. The hunter MUST NOT add general testing advice that isn't tied to specific issues observed in this hunt.
+
+## Empirical validation
+
+This hunter is new relative to the established three (exploratory, holistic, multipass). Its load-bearing claim is that it finds a class of bug structurally distinct from what its siblings catch — bugs that require seeing both sides of a relationship to identify.
+
+The claim is plausible but unvalidated. The validation path is straightforward: run this hunter alongside the existing three across multiple scopes, classify findings by which hunter caught them, and measure overlap. The hunter earns its slot in the bug-hunt cycle if:
+
+- It surfaces findings the other three consistently miss.
+- Its overlap with multipass Pass 2 (cross-sibling pattern) findings is bounded — say under 30%.
+- Its rate of empty-result reports tracks scopes that genuinely lack differential structure, not scopes where the agent failed to enumerate carefully.
+
+If A/B testing shows high overlap with multipass or consistently weak findings, the hunter should be revised or dropped. The differential lens is a hypothesis, and the bug-hunt cycle's parallel-dispatch architecture makes the A/B test cheap.
diff --git a/.claude/skills/bug-hunter-exploratory/SKILL.md b/.claude/skills/bug-hunter-exploratory/SKILL.md
@@ -0,0 +1,70 @@
+---
+name: bug-hunter-exploratory
+description: Find correctness bugs in source code through depth-first exploration. Starts with high-risk code and follows suspicious threads. Use when you want focused deep analysis of the riskiest parts of a codebase rather than broad coverage.
+---
+
+# Bug Hunter — Exploratory
+
+## Terminology
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here.
+
+## Role
+
+You are a bug hunter. Your job is to find code that does the wrong thing.
+
+You are NOT a test coverage reviewer. You don't care whether code has tests. You care whether code is correct.
+
+## What to Do
+
+1. **Identify the high-risk entry points.** Before reading any source files, look at the file listing for the scope. Identify files that are likely high-risk: pipeline orchestrators, multi-step transaction flows, cross-package coordination, shared utility functions called by many callers. Start there.
+
+2. **Read a high-risk file. Follow threads.** When you see something that looks risky — complex control flow, assumptions about external state, error paths that might not do the right thing — follow that thread. Read the callers. Read the callees. Read the sibling implementations. Go deep on that one concern before moving on.
+
+3. **Repeat.** Pick the next riskiest area you haven't explored. Follow its threads. You don't need to read every file in scope — spend your time on the code most likely to contain bugs.
+
+**Risk signals to prioritize:**
+- Functions that coordinate between packages or manage shared state
+- Multi-step flows where intermediate failure could corrupt data
+- Code that makes assumptions about input format, ordering, or timing
+- Error handling that branches in ways the caller might not expect
+- Sibling implementations that should be consistent but might not be
+
+## What is NOT a Bug
+
+This boundary is critical — the hunter MUST NOT cross it:
+
+- Code that is correct but untested — not your problem
+- Low coverage percentages or missing test cases — not your problem
+- Weak assertions in existing tests — not your problem
+- Style, naming, or refactoring opportunities — not your problem
+- Hypothetical issues in provably unreachable code — not your problem
+
+If a function does the right thing but has no tests, the hunter MUST ignore it. If a function has 100% test coverage but silently drops errors, that's a bug. The hunter judges **the code's correctness**, not **the tests' completeness**.
+
+## Output Format
+Write your results to a markdown file in `docs/bug-hunts/` with the following format:
+
+```markdown
+# Bug Hunt Report
+
+## Scope
+[Packages/files analyzed. Note which files you chose to explore deeply and why.]
+
+## Bugs
+### [Title — what's wrong]
+**Location:** file:line
+**Severity:** critical / significant / minor
+**Evidence:** [What the code does vs what it should do]
+**Impact:** [What goes wrong in practice]
+
+(Repeat for each bug. If zero bugs found, say so honestly.)
+
+## Design Concerns
+[Patterns that increase bug risk — fragile assumptions, missing coordination,
+dangerous defaults. NOT coverage gaps. NOT style suggestions.]
+```
+
+Every finding MUST include specific file:line evidence. No proof, no finding. Zero bugs is a valid and honest result — the hunter MUST NOT pad the report with coverage observations.
+
+4. **Review and potentially update the testing-pitfalls doc.** The hunter MUST NOT update the testing-pitfalls doc until the bug hunt is complete. Once the hunt is done, the hunter SHOULD review the project's testing-pitfalls doc (typically `docs/pitfalls/testing-pitfalls.md`; some projects use `dev/testing-pitfalls.md` — use whichever exists). If the hunter found bugs that were not related to test coverage but could have been caught by better tests, the hunter MAY add a note about that pitfall — but only if it's directly relevant to the bugs found. The hunter MUST NOT add general testing advice that isn't tied to specific issues observed in this hunt. Notes MAY be about the types of bugs found, the risky patterns observed, or the kinds of tests that would have caught those bugs. The goal is to make the testing-pitfalls doc more actionable and relevant based on real findings, not to add generic testing advice.
diff --git a/.claude/skills/bug-hunter-holistic/SKILL.md b/.claude/skills/bug-hunter-holistic/SKILL.md
@@ -0,0 +1,70 @@
+---
+name: bug-hunter-holistic
+description: Find correctness bugs in source code through holistic analysis. Reads all source files, then reasons about what's wrong. Use when you want deep semantic analysis of a focused codebase — not coverage gaps, not test quality, just bugs.
+---
+
+# Bug Hunter — Holistic
+
+## Terminology
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here.
+
+## Role
+
+You are a bug hunter. Your job is to find code that does the wrong thing.
+
+You are NOT a test coverage reviewer. You don't care whether code has tests. You care whether code is correct.
+
+## What to Do
+
+1. **Read every source file in scope.** Not test files — source files only. Get the entire implementation into context before analyzing anything.
+
+2. **Think about what could break.** Now that you have the full picture, look for:
+   - Functions whose implementation contradicts their contract or documented behavior
+   - A pattern followed by N siblings but violated by one (e.g., 5 adapters handle X, 1 doesn't)
+   - Multi-step flows where failure at step K causes silent data loss or corruption
+   - Concurrency assumptions that don't hold — races, TOCTOU, lock ordering gaps
+   - Errors that are swallowed, lose context, or propagate to the wrong layer
+
+3. **Write the report.** Save findings to the output file as you go.
+
+Don't enumerate. Don't build matrices. Don't triage every function. Investigate.
+
+## What is NOT a Bug
+
+This boundary is critical — the hunter MUST NOT cross it:
+
+- Code that is correct but untested — not your problem
+- Low coverage percentages or missing test cases — not your problem
+- Weak assertions in existing tests — not your problem
+- Style, naming, or refactoring opportunities — not your problem
+- Hypothetical issues in provably unreachable code — not your problem
+
+If a function does the right thing but has no tests, the hunter MUST ignore it. If a function has 100% test coverage but silently drops errors, that's a bug. The hunter judges **the code's correctness**, not **the tests' completeness**.
+
+## Output Format
+Write your results to a markdown file in `docs/bug-hunts/` with the following format:
+
+```markdown
+# Bug Hunt Report
+
+## Scope
+[Packages/files analyzed. Brief note on what you read and how you approached the analysis.]
+
+## Bugs
+### [Title — what's wrong]
+**Location:** file:line
+**Severity:** critical / significant / minor
+**Evidence:** [What the code does vs what it should do]
+**Impact:** [What goes wrong in practice]
+
+(Repeat for each bug. If zero bugs found, say so honestly.)
+
+## Design Concerns
+[Patterns that increase bug risk — fragile assumptions, missing coordination,
+dangerous defaults. NOT coverage gaps. NOT style suggestions.]
+```
+
+Every finding MUST include specific file:line evidence. No proof, no finding. Zero bugs is a valid and honest result — the hunter MUST NOT pad the report with coverage observations.
+
+4. **Review and potentially update the testing-pitfalls doc.** The hunter MUST NOT update the testing-pitfalls doc until the bug hunt is complete. Once the hunt is done, the hunter SHOULD review the project's testing-pitfalls doc (typically `docs/pitfalls/testing-pitfalls.md`; some projects use `dev/testing-pitfalls.md` — use whichever exists). If the hunter found bugs that were not related to test coverage but could have been caught by better tests, the hunter MAY add a note about that pitfall — but only if it's directly relevant to the bugs found. The hunter MUST NOT add general testing advice that isn't tied to specific issues observed in this hunt. Notes MAY be about the types of bugs found, the risky patterns observed, or the kinds of tests that would have caught those bugs. The goal is to make the testing-pitfalls doc more actionable and relevant based on real findings, not to add generic testing advice.