From c3aa1c9eaeb13239c93f999c3379a9f9751e58ad Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Thu, 30 Apr 2026 05:52:09 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20agent-fleet=20wave-1=20synthesis=20?= =?UTF-8?q?=E2=80=94=20quickstart=20+=20prioritized=20bug=20list?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Output of a 14-agent parallel review of Temper. Each agent had a distinct persona (junior dev, safety-critical architect, QA, security, DevOps, docs reviewer, performance, day-2 ops, Probot/Octokit expert, supply-chain, LLM prompt engineer, rivet integration, test quality, senior reviewer) and a non-overlapping scope. 13 of 14 reported successfully. Aggregated into two artifacts: ## docs/agent-fleet/quickstart.md Hand-off doc for future agents: what Temper is, what's verified working (don't redesign), what's confirmed broken, what's deliberately deferred, and how to pick up a fix without repeating wave-1's investigation. ## docs/agent-fleet/bugs.md Prioritized list of 35 findings, severity-tagged (5 critical, 10 high, 15 medium, 5 low). Each carries a file:line citation and at least one wave-1 agent flag. Ends with action priorities ("if only one PR this week", "if a sprint"). ## Why these matter Wave-1 surfaced bugs that 9 months of iterative dev hadn't: - `synchronizeIssueLabels` silently deletes user-created labels on every webhook. Destructive across the org. - `WEBHOOK_SECRET` defaults to literal "development" when env unset. Trust boundary fail-open. - Dashboard POST routes are unauthenticated. - Test mock retrofit fakes the broken `octokit.issues.X` namespace — exactly why PRs #22, #29, #34 each had to fix it again at a new call site. - `success: true` returned when configuration silently degraded. - `/health` always returns 200. The cross-cutting theme: "looks like it ran" failure pattern across ruleset fallback, signed-signatures swallow, scheduler tick errors, and review-save IO. Fix by class, not case-by-case. ## Not in this commit Wave-2 work — picking up specific fixes, multi-lens AI review, GitHub Check Run integration, splitting `src/app.js`. Documented as open questions in `quickstart.md`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/agent-fleet/bugs.md | 345 +++++++++++++++++++++++++++++++++ docs/agent-fleet/quickstart.md | 133 +++++++++++++ 2 files changed, 478 insertions(+) create mode 100644 docs/agent-fleet/bugs.md create mode 100644 docs/agent-fleet/quickstart.md diff --git a/docs/agent-fleet/bugs.md b/docs/agent-fleet/bugs.md new file mode 100644 index 0000000..d9cbd32 --- /dev/null +++ b/docs/agent-fleet/bugs.md @@ -0,0 +1,345 @@ +# Temper — prioritized bug list (wave-1 fleet output) + +Findings from 13 specialist agents (one pending). Each has a file:line citation +and at least one wave-1 agent flag. Severity tiers: + +- 🔴 **Critical** — destructive, exploitable, or trust-boundary +- 🟠 **High** — silent wrongness, lost work, audit-trail gap +- 🟡 **Medium** — quality, performance, hygiene +- 🟢 **Low** — polish + +Sort: severity, then independent-flagger count, then ease-of-fix. + +--- + +## 🔴 Critical + +### 1. `synchronizeIssueLabels` silently deletes user-created labels (`src/labels.js:69-78`) +Runs on every `repository.created`, every `/sync-all-repos`, every +`/configure-repo`, every `reconcile-repo` task. Any label not in `config.yml +issue_labels` is destroyed. Open issues lose their tags. **There is no opt-out +and no preserve-list.** Single `/sync-all-repos` invocation triggers this +across every repo in the org. +- **Flagged by:** QA bug-hunter (#1) +- **Fix:** add `synchronize_mode: 'merge' | 'replace'` (default `'merge'` — + only create/update; never delete). Behind a config flag for explicit + opt-in to destructive sync. + +### 2. `WEBHOOK_SECRET` fail-open default +Probot defaults the webhook secret to literal `"development"` when the env var +is unset. `index.js` and `mapLegacyEnvVars` (`src/app.js:1136-1149`) don't +fail-fast on missing secret. A misconfigured deploy silently accepts forged +webhooks signed with `"development"`. +- **Flagged by:** Security auditor (#1 high) +- **Fix:** in `index.js` before `await run(registerApp)`: + ```js + if (!process.env.WEBHOOK_SECRET || process.env.WEBHOOK_SECRET === 'development') { + throw new Error('WEBHOOK_SECRET is required and cannot equal "development"'); + } + ``` + +### 3. Dashboard POST routes are unauthenticated (`src/dashboard.js:1287-1495`) +`POST /dashboard/actions/sync`, `/actions/review`, `/actions/analyze-org`, and +JSON APIs `/api/org/repos`, `/api/org/health` are reachable without auth. Any +caller able to hit the bot's port can trigger org-wide reconfiguration or +exfiltrate repo data. +- **Flagged by:** Security auditor (#3 high) +- **Fix:** Bearer / shared-secret middleware for all `/dashboard/*` and + `/api/*` routes; or bind to `127.0.0.1` only behind an authenticating + reverse proxy. + +### 4. `issues.opened` provisioning lacks the auth gate (`src/app.js:327-417`) +Comment-driven commands enforce `requireOrgMember` + `allowed_command_users`. +The issue-form provisioning path (controller_repo) doesn't. Anyone able to +file a labelled issue in the controller repo provisions a real org repo. +- **Flagged by:** Security auditor (#2 high) +- **Fix:** add the same allowlist + org-member check before + `enqueueTask('provision-repo', ...)`. + +### 5. Two SQLite connections to the same `dedup.db` (`src/app.js:1039-1047`) +`initKVStore` is called twice (idempotency + ai_rate_limits) against the same +file. WAL tolerates multi-connection reads, but writes from two connections +can `SQLITE_BUSY` outside a single transaction. When `markProcessed` and +`recordReview` race, one throws → handler crashes mid-flight → `markProcessed` +skipped → **same delivery is reprocessed and the AI review runs twice.** +- **Flagged by:** QA bug-hunter (#3 critical) +- **Fix:** share one `Database` connection between the two KV instances. + +--- + +## 🟠 High + +### 6. Chatops handler doesn't `markProcessed` on early returns or exceptions +`handleChatopsIssue` early-returns on auth failure / missing fields without +`markProcessed`. Exception during `closeIssue()` after `commentBack()` +succeeded leaves the wrapping handler unable to mark the delivery — Probot +retries → **duplicate analysis-report issues, duplicate auth-rejection +comments**. Same pattern for `/check-config`, `/sync-all-repos`, +`/allow-merge-commit`. +- **Flagged by:** QA bug-hunter (#2 critical) +- **Fix:** `try/finally` wrapping with `markProcessed` always, plus + idempotency via dedup key on the action itself. + +### 7. `success: true` returned when configuration silently degraded +- `applyBranchEnforcement` falls back to legacy on rulesets 422/403/404 + silently (`src/repository.js:30-58`). +- `setRequiredSignatures` swallows 404/422 with a `warn` + (`src/branch-protection.js:55-62`). +- Result: `configureRepository` returns `{success: true}` even when signed + commits aren't enforced and no rulesets are applied. +- **Flagged by:** Safety-critical architect (#1, structural), QA, DevOps +- **Fix:** read-back-and-assert pattern. After applying, fetch the actual + state and assert it matches the desired set. Surface mismatches as + `{success: false, missing: [...]}`. Emit a GitHub Check Run on the repo's + default branch with conclusion `failure` so an org-required check refuses + merges until the bot succeeds. + +### 8. `/health` always returns 200 (`src/app.js:984+`) +Returns `status: 'healthy'` unconditionally. Scheduler stuck, SQLite locked, +AI endpoint dead, disk full: all return 200. PM2 never restarts on hang. +- **Flagged by:** DevOps/SRE (#1 critical), Safety-critical architect +- **Fix:** Real liveness signal: scheduler last-tick-age threshold, DB ping, + free-disk threshold, last-error counter. Return 503 on any. Wire to PM2 + `--max-memory-restart` and external uptime monitor. + +### 9. Inline AI review on `pull_request.opened` is not durable (`src/app.js:840-929`) +Auto AI review runs in the webhook handler with no task-store enqueue. PM2 +restart mid-`reviewPullRequest` = silent loss. Same for the `enableAutoMerge` +mutation. Compare to `provision-repo`, `generate-dependabot`, +`revert-merge-settings`, `reconcile-repo` which ARE durable. +- **Flagged by:** DevOps/SRE, Performance, Safety architect +- **Fix:** enqueue `auto-review` and `auto-enable-merge` tasks; webhook + handler returns within ~50 ms. + +### 10. `reviews.json` non-atomic full-rewrite (`src/ai-review.js:622-651`) +`fs.writeFileSync(REVIEWS_PATH, JSON.stringify(_reviews))` on every review. +If process killed mid-write the file is truncated → next `loadReviews()` +swallows via bare `catch {}` (line 631) → **all review history silently +resets to `[]`.** +- **Flagged by:** QA, Performance, DevOps +- **Fix:** replace with a SQLite table next to `tasks.db`. Append, never + rewrite. Bonus: add tamper-evidence (SHA chain) for the audit-trail claim + the safety architect flagged. + +### 11. Test mock retrofit hides production bugs (`__tests__/integration/{ai-review,app}.test.js`) +`createMockOctokit` retrofits `octokit.issues.X` jest.fn()s over the real +`octokit.request` route table. **Production code that calls `.issues.X` passes +tests and crashes in prod.** This is exactly the PR #22 / #29 / #34 regression +class still active. The smoke `webhook.test.js` accepts `[200, 202, 401, 500]` +for `issue_comment.created`, swallowing the very 500 a `.issues` call would +produce. +- **Flagged by:** Test-quality engineer, Senior reviewer (pending) +- **Fix:** replace with a "real-shape" octokit fake exposing only `.request()` + and `.paginate()` — Probot v14's actual surface. Tighten e2e to `[200, 202]` + only. + +### 12. Slop regex too greedy (`src/ai-review-prompt.js:65+`) +`\b(could|should|may|consider)\b` drops legitimate findings: +- "this **could** panic on null deref at line 42" +- "function **should** not be called from async context" +- "**Consider**[ed]" — `\bconsider\s+/` kills "the constructor doesn't consider the empty-list case" +Quote-or-die `String.includes` is whitespace-fragile (CRLF/tab normalisation). +Schema accepts `quoted_line: "+"` (1-char minimum) — every diff has `+`. +- **Flagged by:** LLM prompt engineer +- **Fix:** convert single-word patterns to phrase patterns ("we should + consider", "might be wise to"). Normalise whitespace before quote check. + Tighten schema: `quoted_line` `pattern: "^[+-].+"`, `claim minLength: 30`. + +### 13. `pull_request.opened` has no idempotency guard (`src/app.js:840+`) +Only the 5-min AI rate limit prevents duplicate auto-reviews on webhook +redelivery. `enablePullRequestAutoMerge` GraphQL is called every time without +guard. +- **Flagged by:** QA bug-hunter +- **Fix:** standard `isProcessed`/`markProcessed` pattern at handler entry. + +### 14. Self-update: no signature verification, no rollback +`src/self-update.js:6-28` spawns `tools/self-update/target/release/temper-self-update` +inheriting full env (line 21, `...process.env` leaks all secrets to child). +No SHA / signature check on the binary. `git reset --hard origin/main` is not +atomic with `npm ci`; if `npm ci` fails after reset, you're on new code with +old/partial deps and no rollback. +- **Flagged by:** Safety architect, DevOps +- **Fix:** snapshot SHA before reset, smoke `node -e "require('./index.js')"` + before `pm2 restart`, auto-revert on failure. Verify a signed release tag + before exec'ing the binary. Whitelist env vars passed to the child. + +### 15. Tarball extraction lacks symlink/path-traversal safety (`src/rivet-fetch.js:48`) +`tar -xz --strip-components=1 -C destDir` — no `--no-same-owner`, no `-P` +rejection, no symlink guard. Malicious PR head ref can include a symlink +pointing outside `destDir`; subsequent `runRivetOracle` reads `rivet.yaml` +from that target. +- **Flagged by:** Security auditor (medium) +- **Fix:** drop symlinks via JS `tar` lib with `filter`, or `--no-overwrite-dir` + + pre-validate entries. + +--- + +## 🟡 Medium + +### 16. Multiple `.repos.X` / `.pulls.X` calls in dashboard.js (PR #22 risk class) +`octokit.repos.get` (`dashboard.js:79`), `getCommitSignatureProtection` (:83), +`pulls.list` (:87), `getBranch` (:432), `compareCommits` (:440), `listBranches` +(:457), `actions.listWorkflowRunsForRepo` (:74), `checks.listForRef` (:103). +Plus `app.js:268` `context.octokit.repos.getBranch` — exact same shape as the +PR #22/#29/#34 saga. +- **Flagged by:** Probot/Octokit expert +- **Fix:** sweep these to `.request()` form. dashboard.js builds its own + octokit (line 31), so the namespace IS reliable there — but app.js:268 is + on `context.octokit` and shares the failure mode. + +### 17. Documentation references npm scripts that don't exist +README.md:141-144 and DEVELOPMENT.md:230-233 list `npm run deploy:docker`, +`deploy:heroku`, `deploy:server`, `deploy:netcup`. Only `docker:build` exists +in package.json. +- **Flagged by:** Junior dev, Documentation reviewer +- **Fix:** delete the fictional scripts from docs OR add them to package.json. + +### 18. GITHUB_APP_SETUP.md uses obsolete env var names +Uses `GITHUB_APP_ID`/`GITHUB_PRIVATE_KEY`/`GITHUB_WEBHOOK_SECRET` (legacy); +`.env.example` and DEVELOPMENT.md use the canonical `APP_ID`/`PRIVATE_KEY`/ +`WEBHOOK_SECRET`. Junior devs copy whichever they read last. +- **Flagged by:** Junior dev, Documentation reviewer +- **Fix:** update GITHUB_APP_SETUP.md to canonical names. + +### 19. Sequential GitHub fetches in `reviewPullRequest` (`src/ai-review.js:382-398`) +3 strictly sequential REST calls (PR meta, diff media-type, files): ~600-1500 +ms before any work. Independent — could be `Promise.all`. +- **Flagged by:** Performance +- **Fix:** parallelise; saves 0.5-1 s per review. + +### 20. Sequential rivet head + base validate (`src/ai-review.js:417-451`) +Two tarball fetches + two extracts + two rivet runs, sequentially. ~50% +wall-time savings if parallelised. +- **Flagged by:** Performance +- **Fix:** `Promise.all` the head and base oracle runs. + +### 21. `synchronizeAllRepositories` no concurrency cap (`src/organization.js:34-52`) +Fully sequential `for...of`. ~30-60 s for 13 repos. With 100 repos it can +also exceed the 5000/h GitHub rate limit. +- **Flagged by:** Performance, Probot expert +- **Fix:** `p-limit(3)` bounded concurrency. + +### 22. `provenance` job attests `package.json` only (`.github/workflows/ci.yml:75-92`) +Doesn't attest the Docker image, the self-update Rust binary, or the SBOM. +Actions pinned by tag (`@v4`, `@v0`), not commit SHA. `node:20-alpine` is a +floating tag. +- **Flagged by:** Supply-chain +- **Fix:** build actual artifacts, capture digests, feed digests into + `attest-build-provenance@v2` and `attest-sbom@v2` as `subject-digest`. + SHA-pin every action. Pin base image to digest. + +### 23. `npm audit ... || true` swallows findings (`.github/workflows/ci.yml:21`) +Audit is advisory, not enforcing. Vulnerable deps merge to main without alarm. +- **Flagged by:** Supply-chain +- **Fix:** drop `|| true`. Set `--audit-level=high` if you want to allow + moderate transitive findings. + +### 24. eslint@8 is end-of-life +No security backports as of 2024-10. Quality gate against deprecated toolchain. +- **Flagged by:** Supply-chain +- **Fix:** migrate to eslint v9 (flat config) — sizable but isolated. + +### 25. `provisioning.js` regex accepts invalid GitHub repo names +`REPO_NAME_RE = /^[a-zA-Z0-9._-]+$/` accepts `.`, `..`, `.git`, leading +dots, `-`, `--`. GitHub 422s on these; user sees generic "❌ Failed to +create repository". +- **Flagged by:** QA, Junior dev +- **Fix:** stricter regex; reject leading dots / dashes / `.git` suffix. + +### 26. `_reviewTimestamps` in-memory Map grows unbounded (`src/ai-review.js:22`) +Dead code when persistent store wired; memory leak when not. No `cleanup()`. +- **Flagged by:** QA, DevOps +- **Fix:** delete the in-memory fallback now that SQLite KV is the canonical path. + +### 27. `dependabot.js` order-sensitive equality check (`src/dependabot.js:108`) +`JSON.stringify(currentConfig) === JSON.stringify(targetConfig)` — yaml.load +preserves source order; identical configs in different key orders compare +unequal → unnecessary writes (and PRs when `use_pull_requests: true`). +- **Flagged by:** QA +- **Fix:** deep-equal a normalised form, not stringified. + +### 28. Module-level `loadConfig()` and `startAutoCleanup()` at import time +`src/config.js:82`, `src/idempotency.js:74`. Side effects at module import +make the bot untestable in isolation and prevent a "validated config or +refuse to start" boot policy. +- **Flagged by:** Safety architect, Senior reviewer +- **Fix:** explicit `init()` called from `index.js`; refuse to start on + invalid config. + +### 29. AI prompt-injection mitigation incomplete (`src/ai-review.js:68-71`) +`sanitizeAIOutput` only strips `::workflow-command::`. Doesn't redact +`` HTML, mass `@mentions`, or fake "Approval" verdicts. PR diffs are +attacker-controlled. +- **Flagged by:** Security auditor (medium), LLM expert +- **Fix:** widen sanitizer; consider posting AI output as a quote block to + prevent markdown rendering of model output. + +### 30. Rivet binary version drift +Bot ships with rivet v0.4.3; rivet repo's main may use newer schema (this +already happened during dev). Conversion bugs become silent +"unknown→unknown" findings. +- **Flagged by:** Rivet expert +- **Fix:** on startup, run `rivet --version` and assert against a pinned + `expected_rivet_version`. On mismatch, log loudly and disable the oracle. + +--- + +## 🟢 Low + +### 31. `CONTRIBUTING.md` links to non-existent `CODE_OF_CONDUCT.md` +- **Flagged by:** Junior dev, Documentation +- **Fix:** add the file or remove the link. + +### 32. `CHANGELOG.md` migration guide tells users to install probot v12, octokit v19 +`package.json` ships probot v14, octokit v22. +- **Flagged by:** Documentation +- **Fix:** delete the outdated migration block. + +### 33. `verify-netcup-deployment.sh` is a printf script with no enforcement +Every check is informational; only missing-files exits non-zero. Suggests +`node index.js &` with no PM2. +- **Flagged by:** DevOps +- **Fix:** make checks enforcing or delete; commit `ecosystem.config.js` + for PM2 instead. + +### 34. No `LICENSE` file at repo root; `package.json` declares MIT +18 BSD-3 + 12 Apache-2.0 transitive deps require attribution that the +repo doesn't carry. +- **Flagged by:** Supply-chain +- **Fix:** add LICENSE + NOTICE. + +### 35. Probot v14 events not subscribed but should be +- `pull_request.synchronize` — bot doesn't re-review on force-push (10-line fix) +- `pull_request.reopened` — symmetrical to `.opened` +- `pull_request_review` / `pull_request_review_comment` — bot can't react to + human reviews +- `check_run.rerequested` — would let `/review-pr` be triggered from Checks UI +- `installation.created` / `installation_repositories.added` — could replace + manual `/sync-all-repos` flow +- **Flagged by:** Probot/Octokit expert + +--- + +## What's NOT here + +- The senior code reviewer's report wasn't in when this was written. When it + arrives, fold into the relevant categories above (likely module-cohesion + and complexity hotspots, which the safety architect already covered). +- Wave-2 will add new findings as fixes are attempted (and surface bugs + in fixes themselves). + +--- + +## Action priorities for the human + +If only one PR this week: **#1 (label deletion)**. It's destructive, runs +on every webhook, and the fix is small. + +If two: add **#2 (`WEBHOOK_SECRET` fail-fast)**. One line; closes a real +trust boundary. + +If three: add **#11 (real-shape octokit test fake)**. Single change that +prevents the entire PR-22 regression class from ever shipping again. + +If a sprint: 1, 2, 4, 5, 7, 8, 9 — covers destructive, trust, audit-trail, +liveness in one sweep. diff --git a/docs/agent-fleet/quickstart.md b/docs/agent-fleet/quickstart.md new file mode 100644 index 0000000..de43b40 --- /dev/null +++ b/docs/agent-fleet/quickstart.md @@ -0,0 +1,133 @@ +# Temper — agent-fleet quickstart + +Hand-off doc for wave-2 agents. Read this **before** investigating Temper for +the first time so you don't repeat work that 14 wave-1 agents already did. + +## What Temper is, in one paragraph + +Probot v14 (Node.js, ESM) GitHub App that hardens `pulseengine` org repos to a +declarative `config.yml`. Reacts to `repository.created`, `issue_comment.created`, +`issues.opened`, `pull_request.{opened,closed}`, and `push`. Persists state to +SQLite (`data/tasks.db`, `data/dedup.db`). Runs a 5-minute scheduler tick. +Deployed on a single Netcup VPS (3.8 GB RAM, 2 cores, no swap) under PM2 with +self-update on `push` to `main`. Talks to a local Ollama +(`qwen2.5-coder:3b`) for AI code review on PRs, and to a `rivet` CLI for +mechanical traceability oracle on `pulseengine/rivet`-instrumented repos. + +## What's been verified to work (don't redesign these) + +- **Repository Rulesets API is the right replacement** for legacy branch + protection on empty repos (`src/rulesets.js`). The translator + drift + detection works correctly when contexts are non-empty. +- **Persistent task store** (`src/task-store.js`) — atomic claim via + `UPDATE ... RETURNING`, dedup by key, exponential backoff, 3-attempt cap. +- **Idempotency + AI rate-limit** moved to SQLite KV (`src/persistent-kv.js`, + `src/idempotency.js`, `src/ai-review.js`). Survives PM2 restart. +- **AI review pipeline shape** — strict-JSON system prompt + grammar-constrained + Ollama `response_format` + parser + slop filter + quote-or-die + deterministic + `computeVerdict` is **the right architecture for a 3 B model**. Don't redesign. +- **Rivet oracle integration** — `runRivetValidate`, `runRivetImpact`, + `subtractFindings` (delta vs base), `groupOracleFindings` (collapse identical + messages). Tarball-fetch is side-effect-clean. +- **chatops_repo restriction** — `chatops_repo.enabled: true` plus a private + admin repo correctly hides ChatOps surface from public repos. +- **Issue forms in `docs/temper-ops-template/`** dispatch to ChatOps via + `handleChatopsIssue` based on `chatops:` label. Smoke-tested. + +## What's confirmed broken (the bugs.md file is the prioritized list) + +Top categories from 13 specialists: + +1. **Silent state mutation** — `synchronizeIssueLabels` deletes labels not in + target list. **Destructive** on every webhook tick. +2. **Trust-boundary fail-open** — `WEBHOOK_SECRET` defaults to literal + `"development"` in Probot when env var is unset. Index doesn't fail-fast. +3. **Unauthenticated dashboard POST routes** can trigger org-wide actions and + exfiltrate repo data. +4. **`issues.opened` provisioning has no auth gate** (the comment-driven + commands do; the issue-form path doesn't). +5. **`/health` lies** — always returns 200 even when scheduler is dead. PM2 + never restarts on hang. +6. **Silent degradation reported as success** — rulesets 422 → fallback to + legacy → "✅ Configured". `setRequiredSignatures` 404 → swallowed → audit + trail says "configured" but enforcement isn't there. +7. **Test mocks fake the broken namespace** — `createMockOctokit` retrofits + `octokit.issues.X` over `octokit.request` so production code that uses + `.issues.X` *passes tests* and *crashes in prod*. This is exactly the PR #22 + regression class, still active in 2 test files. +8. **Module-level singletons everywhere** — `_taskStore`, `_scheduler`, + `_octokit`, `_reviews`, `activeLogger`, `config`. No injection seam. +9. **Inline AI review on `pull_request.opened`** is not in the task store; PM2 + restart mid-review = silent loss. +10. **No GitHub Check Run** — AI review can't be a required check, so its + "request_changes" verdict cannot actually gate merges. + +## What's been deliberately left for wave-2 (open questions) + +- **Multi-lens AI review** — discover (3 narrow lenses) → fresh-session validate + → emit. The single-shot strict-JSON path is wave-1; multi-lens is the next + quality jump and was scoped but not built. +- **GitHub Check Runs API integration** — converting AI review verdicts into + `check_run` results that can be required-checked. Architecture decision pending. +- **`pull_request.synchronize` event subscription** — bot doesn't re-review on + force-push today. Probot expert flagged this as a 10-line change. +- **Replace `reviews.json` with a SQLite table** — DevOps + Performance both + flagged the unbounded full-file-rewrite pattern. +- **Split `src/app.js`** — 1159 lines doing event handling, ChatOps slash + commands, ChatOps issue-form dispatch, persistence init, scheduler init, + helper utilities. Senior code reviewer's recommended next refactor. +- **CLI `temper-admin`** — single binary wrapping every diagnostic in the + runbook. Day-2 ops agent's #1 missing tool. +- **Per-repo config overlay** — currently global only. Many fixes touch this + surface; do it before adding policy variations. + +## How to pick up a fix + +1. Skim `docs/agent-fleet/bugs.md` — pick a bug. +2. Verify the file:line citation is still accurate (the codebase moves; some + findings may have been fixed in PR #34+). +3. Branch off `main`. Open a focused PR. Don't bundle. +4. Tests: **don't trust** `__tests__/integration/{ai-review,app}.test.js` — + their `createMockOctokit` retrofit fakes the broken namespace. If your fix + touches an octokit call, add a "real-shape" test (only `.request()` and + `.paginate()`, no `.issues`/`.repos` namespaces). +5. PR description should cite which wave-1 agent (or multiple) flagged the bug, + and quote the file:line reference. + +## Reference: the 14 wave-1 personas + their non-overlapping scopes + +| # | Persona | Primary scope | +|---|---|---| +| 1 | Junior dev | README/DEVELOPMENT/SETUP onboarding experience | +| 2 | Safety-critical EE architect | Failure modes vs ASIL-D/safety-critical fitness | +| 3 | QA bug-hunter | Race conditions, error swallowing, off-by-one | +| 4 | Security auditor | OWASP top 10, signature, injection, traversal | +| 5 | DevOps / SRE | Production readiness, observability, recovery | +| 6 | Documentation reviewer | Docs vs code drift | +| 7 | Senior code reviewer | Module cohesion, complexity hotspots (pending) | +| 8 | Performance engineer | Hot paths, latency, parallelism | +| 9 | Day-2 operations | Runbook for 8 incident types | +| 10 | Probot/Octokit expert | v14 idioms, deprecated patterns, missing events | +| 11 | Supply-chain compliance | SBOM, provenance, deps, license, lockfiles | +| 12 | LLM prompt engineer | Strict-JSON contract, slop filter, multi-lens | +| 13 | Rivet integration | Oracle correctness, schema drift, missed features | +| 14 | Test quality | Coverage shape, mock leakage, flakiness | + +Each agent's full report is in `/private/tmp/claude-501/.../tasks/.output` +on the orchestrator's machine; the synthesis here is the canonical record. + +## Cross-cutting themes (multiple agents converged) + +- **"Looks like it ran" failure pattern** — repeated across the codebase: + ruleset 422 → silent legacy fallback, signed-signatures 404 → silent skip, + scheduler tick error → silent skip, save-review IO error → silent reset to + `[]`. Fix by class, not case-by-case. +- **The bot can't actually gate commits** — it posts a comment, not a Check + Run. Every "AI review enforces standards" framing in docs is hollow until + this is wired. +- **Mocking too much** — `app.test.js` jest.mocks 12 modules. Most tests + exercise routing logic against ghosts. Add at least one wired integration + test per dispatch path. +- **Module-level state is the architectural debt** — every singleton blocks + isolation, testing, and multi-installation. Untangling this is the + highest-leverage refactor.