From b6b8445e69b1c5c6f623a80dbc1e58098716b022 Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 5 Jun 2026 01:07:30 +0000 Subject: [PATCH 01/29] chore(skills): add superpowers plugin + vendor performance-audit skill bundle Install superpowers@claude-plugins-official (v5.1.0) and enable it in repo settings. Vendor the non-colliding skills from the attached bundle into .claude/skills/ (superpowers-plus perf-audit/bug-hunter/build/handoff family, project-setup init skills, url-to-markdown). Project-customized colliding skills (writing-plans-enhanced, plan-review-cycle, bug-hunt-cycle, health-review-cycle, project-health-review) are preserved, not overwritten. Scaffold docs/perf-audits/ and record setup decisions in DECISIONS.md. https://claude.ai/code/session_01B2SLSJ6PN3tJaUDEE8SqTe --- .claude/settings.json | 3 +- .../skills/bug-hunter-differential/SKILL.md | 119 ++ .../skills/bug-hunter-exploratory/SKILL.md | 70 ++ .claude/skills/bug-hunter-holistic/SKILL.md | 70 ++ .claude/skills/bug-hunter-multipass/SKILL.md | 84 ++ .claude/skills/build-robust-features/SKILL.md | 92 ++ .../skills/claude-agents-md-init/README.md | 222 ++++ .claude/skills/claude-agents-md-init/SKILL.md | 359 ++++++ .../references/claude-agents-md-template.md | 439 +++++++ .claude/skills/git-strategy-init/README.md | 99 ++ .claude/skills/git-strategy-init/SKILL.md | 333 +++++ .../references/git-strategy-template.md | 572 +++++++++ .claude/skills/handoff/SKILL.md | 191 +++ .../skills/performance-audit-cycle/SKILL.md | 153 +++ .../whole-repo-scoping.md | 384 ++++++ .claude/skills/performance-audit/README.md | 191 +++ .claude/skills/performance-audit/SKILL.md | 248 ++++ .../performance-audit/currency-protocol.md | 114 ++ .../performance-audit/feedback-template.md | 140 +++ .../skills/performance-audit/finding-model.md | 176 +++ .../skills/performance-audit/lane-prompts.md | 241 ++++ .../performance-audit/profile-packs/dotnet.md | 312 +++++ .../profile-packs/dotnet/aspnet-core.md | 39 + .../profile-packs/dotnet/blazor.md | 47 + .../profile-packs/dotnet/caching.md | 52 + .../dotnet/dependency-injection.md | 41 + .../profile-packs/dotnet/interop.md | 49 + .../dotnet/messaging-realtime.md | 53 + .../profile-packs/dotnet/object-mapping.md | 44 + .../profile-packs/dotnet/sql-server-data.md | 215 ++++ .../profile-packs/dotnet/wcf.md | 102 ++ .../profile-packs/dotnet/winforms.md | 69 + .../profile-packs/dotnet/wpf.md | 101 ++ .../profile-packs/generic-pack.md | 123 ++ .../performance-audit/profile-packs/go.md | 144 +++ .../profile-packs/go/caching.md | 61 + .../profile-packs/go/database-sql.md | 86 ++ .../profile-packs/go/grpc.md | 87 ++ .../profile-packs/go/messaging.md | 84 ++ .../profile-packs/go/net-http-servers.md | 95 ++ .../profile-packs/go/serialization.md | 83 ++ .../performance-audit/profile-packs/html.md | 155 +++ .../profile-packs/html/fonts.md | 88 ++ .../profile-packs/html/images-media.md | 78 ++ .../profile-packs/javascript-typescript.md | 179 +++ .../javascript-typescript/angular.md | 101 ++ .../javascript-typescript/bundling-build.md | 116 ++ .../javascript-typescript/node-backend.md | 22 + .../javascript-typescript/node-data.md | 101 ++ .../javascript-typescript/react.md | 109 ++ .../javascript-typescript/vue.md | 94 ++ .../performance-audit/profile-packs/jvm.md | 77 ++ .../performance-audit/profile-packs/python.md | 124 ++ .../profile-packs/python/async-asyncio.md | 97 ++ .../profile-packs/python/data-stack.md | 24 + .../profile-packs/python/orm-database.md | 33 + .../profile-packs/python/serialization.md | 80 ++ .../profile-packs/python/task-queues.md | 24 + .../profile-packs/python/web-frameworks.md | 93 ++ .../performance-audit/profile-packs/rust.md | 207 +++ .../profile-packs/rust/async-tokio.md | 83 ++ .../profile-packs/rust/data-parallelism.md | 100 ++ .../profile-packs/rust/database.md | 97 ++ .../profile-packs/rust/serde-serialization.md | 80 ++ .../profile-packs/rust/web.md | 75 ++ .../performance-audit/profile-packs/sql.md | 173 +++ .../profile-packs/sql/postgres.md | 115 ++ .../profile-packs/sql/tsql.md | 34 + .../performance-audit/profile-packs/swift.md | 74 ++ .../skills/performance-audit/run-schema.md | 100 ++ .../performance-audit/test-fixtures/README.md | 90 ++ .../test-fixtures/behavioral/materiality.md | 48 + .../reference-not-checklist/orders.py | 63 + .../reference-not-checklist/spec.md | 40 + .../django-sample/currency-brief.md | 34 + .../django-sample/expected-findings.md | 40 + .../test-fixtures/django-sample/views.py | 57 + .../dotnet-sample/OrdersController.cs | 77 ++ .../dotnet-sample/expected-findings.md | 47 + .../go-sample/expected-findings.md | 50 + .../test-fixtures/go-sample/inventory.go | 80 ++ .../test-fixtures/go-sample/service.go | 61 + .../html-sample/expected-findings.md | 51 + .../test-fixtures/html-sample/index.html | 54 + .../test-fixtures/python-sample/app.py | 42 + .../test-fixtures/python-sample/benchmark.py | 51 + .../test-fixtures/python-sample/config.py | 24 + .../python-sample/cost-map-expected.md | 41 + .../python-sample/expected-findings.md | 65 + .../test-fixtures/python-sample/inventory.py | 38 + .../python-sample/lane8-expected.md | 36 + .../test-fixtures/python-sample/pricing.py | 59 + .../test-fixtures/python-sample/repo.py | 24 + .../test-fixtures/python-sample/report.py | 46 + .../test-fixtures/python-sample/tasks.py | 26 + .../test-fixtures/react-sample/HeavyChart.jsx | 7 + .../test-fixtures/react-sample/Home.jsx | 6 + .../react-sample/LegacyWidget.jsx | 17 + .../react-sample/ProductList.jsx | 47 + .../test-fixtures/react-sample/Rarely.jsx | 7 + .../test-fixtures/react-sample/Row.jsx | 12 + .../react-sample/currency-brief.md | 36 + .../test-fixtures/react-sample/entry.jsx | 29 + .../react-sample/expected-findings.md | 42 + .../test-fixtures/react-sample/index.jsx | 11 + .../react-sample/lane7-expected.md | 33 + .../test-fixtures/react-sample/package.json | 11 + .../rust-sample/expected-findings.md | 49 + .../test-fixtures/rust-sample/handlers.rs | 61 + .../test-fixtures/rust-sample/inventory.rs | 45 + .../sql-sample/expected-findings.md | 55 + .../test-fixtures/sql-sample/procs.sql | 37 + .../test-fixtures/sql-sample/queries.sql | 37 + .../test-fixtures/sql-sample/schema.sql | 28 + .../version-indexes/README.md | 66 + .../version-indexes/dotnet.md | 241 ++++ .../performance-audit/version-indexes/go.md | 81 ++ .../version-indexes/javascript-typescript.md | 121 ++ .../performance-audit/version-indexes/jvm.md | 75 ++ .../version-indexes/python.md | 78 ++ .../performance-audit/version-indexes/rust.md | 111 ++ .../version-indexes/swift.md | 78 ++ .claude/skills/pitfalls-docs-init/README.md | 113 ++ .claude/skills/pitfalls-docs-init/SKILL.md | 203 +++ .../implementation-pitfalls-template.md | 255 ++++ .../references/testing-pitfalls-template.md | 126 ++ .claude/skills/project-init/README.md | 57 + .claude/skills/project-init/SKILL.md | 206 +++ .claude/skills/url-to-markdown/README.md | 250 ++++ .claude/skills/url-to-markdown/SKILL.md | 318 +++++ .../examples/reworked-example.md | 74 ++ .../references/failure-modes.md | 226 ++++ .../references/security-model.md | 148 +++ .../references/tool-selection-rationale.md | 237 ++++ .../url-to-markdown/scripts/bootstrap.ps1 | 20 + .../url-to-markdown/scripts/bootstrap.py | 252 ++++ .../url-to-markdown/scripts/bootstrap.sh | 17 + .../url-to-markdown/scripts/lib/extractors.py | 330 +++++ .../url-to-markdown/scripts/lib/ssrf_guard.py | 148 +++ .../scripts/lib/structured_warnings.py | 96 ++ .../scripts/url_to_markdown.py | 1107 +++++++++++++++++ docs/perf-audits/DECISIONS.md | 97 ++ docs/perf-audits/runs.jsonl | 0 143 files changed, 15864 insertions(+), 1 deletion(-) create mode 100644 .claude/skills/bug-hunter-differential/SKILL.md create mode 100644 .claude/skills/bug-hunter-exploratory/SKILL.md create mode 100644 .claude/skills/bug-hunter-holistic/SKILL.md create mode 100644 .claude/skills/bug-hunter-multipass/SKILL.md create mode 100644 .claude/skills/build-robust-features/SKILL.md create mode 100644 .claude/skills/claude-agents-md-init/README.md create mode 100644 .claude/skills/claude-agents-md-init/SKILL.md create mode 100644 .claude/skills/claude-agents-md-init/references/claude-agents-md-template.md create mode 100644 .claude/skills/git-strategy-init/README.md create mode 100644 .claude/skills/git-strategy-init/SKILL.md create mode 100644 .claude/skills/git-strategy-init/references/git-strategy-template.md create mode 100644 .claude/skills/handoff/SKILL.md create mode 100644 .claude/skills/performance-audit-cycle/SKILL.md create mode 100644 .claude/skills/performance-audit-cycle/whole-repo-scoping.md create mode 100644 .claude/skills/performance-audit/README.md create mode 100644 .claude/skills/performance-audit/SKILL.md create mode 100644 .claude/skills/performance-audit/currency-protocol.md create mode 100644 .claude/skills/performance-audit/feedback-template.md create mode 100644 .claude/skills/performance-audit/finding-model.md create mode 100644 .claude/skills/performance-audit/lane-prompts.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/aspnet-core.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/blazor.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/caching.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/dependency-injection.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/interop.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/messaging-realtime.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/object-mapping.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/sql-server-data.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/wcf.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/winforms.md create mode 100644 .claude/skills/performance-audit/profile-packs/dotnet/wpf.md create mode 100644 .claude/skills/performance-audit/profile-packs/generic-pack.md create mode 100644 .claude/skills/performance-audit/profile-packs/go.md create mode 100644 .claude/skills/performance-audit/profile-packs/go/caching.md create mode 100644 .claude/skills/performance-audit/profile-packs/go/database-sql.md create mode 100644 .claude/skills/performance-audit/profile-packs/go/grpc.md create mode 100644 .claude/skills/performance-audit/profile-packs/go/messaging.md create mode 100644 .claude/skills/performance-audit/profile-packs/go/net-http-servers.md create mode 100644 .claude/skills/performance-audit/profile-packs/go/serialization.md create mode 100644 .claude/skills/performance-audit/profile-packs/html.md create mode 100644 .claude/skills/performance-audit/profile-packs/html/fonts.md create mode 100644 .claude/skills/performance-audit/profile-packs/html/images-media.md create mode 100644 .claude/skills/performance-audit/profile-packs/javascript-typescript.md create mode 100644 .claude/skills/performance-audit/profile-packs/javascript-typescript/angular.md create mode 100644 .claude/skills/performance-audit/profile-packs/javascript-typescript/bundling-build.md create mode 100644 .claude/skills/performance-audit/profile-packs/javascript-typescript/node-backend.md create mode 100644 .claude/skills/performance-audit/profile-packs/javascript-typescript/node-data.md create mode 100644 .claude/skills/performance-audit/profile-packs/javascript-typescript/react.md create mode 100644 .claude/skills/performance-audit/profile-packs/javascript-typescript/vue.md create mode 100644 .claude/skills/performance-audit/profile-packs/jvm.md create mode 100644 .claude/skills/performance-audit/profile-packs/python.md create mode 100644 .claude/skills/performance-audit/profile-packs/python/async-asyncio.md create mode 100644 .claude/skills/performance-audit/profile-packs/python/data-stack.md create mode 100644 .claude/skills/performance-audit/profile-packs/python/orm-database.md create mode 100644 .claude/skills/performance-audit/profile-packs/python/serialization.md create mode 100644 .claude/skills/performance-audit/profile-packs/python/task-queues.md create mode 100644 .claude/skills/performance-audit/profile-packs/python/web-frameworks.md create mode 100644 .claude/skills/performance-audit/profile-packs/rust.md create mode 100644 .claude/skills/performance-audit/profile-packs/rust/async-tokio.md create mode 100644 .claude/skills/performance-audit/profile-packs/rust/data-parallelism.md create mode 100644 .claude/skills/performance-audit/profile-packs/rust/database.md create mode 100644 .claude/skills/performance-audit/profile-packs/rust/serde-serialization.md create mode 100644 .claude/skills/performance-audit/profile-packs/rust/web.md create mode 100644 .claude/skills/performance-audit/profile-packs/sql.md create mode 100644 .claude/skills/performance-audit/profile-packs/sql/postgres.md create mode 100644 .claude/skills/performance-audit/profile-packs/sql/tsql.md create mode 100644 .claude/skills/performance-audit/profile-packs/swift.md create mode 100644 .claude/skills/performance-audit/run-schema.md create mode 100644 .claude/skills/performance-audit/test-fixtures/README.md create mode 100644 .claude/skills/performance-audit/test-fixtures/behavioral/materiality.md create mode 100644 .claude/skills/performance-audit/test-fixtures/behavioral/reference-not-checklist/orders.py create mode 100644 .claude/skills/performance-audit/test-fixtures/behavioral/reference-not-checklist/spec.md create mode 100644 .claude/skills/performance-audit/test-fixtures/django-sample/currency-brief.md create mode 100644 .claude/skills/performance-audit/test-fixtures/django-sample/expected-findings.md create mode 100644 .claude/skills/performance-audit/test-fixtures/django-sample/views.py create mode 100644 .claude/skills/performance-audit/test-fixtures/dotnet-sample/OrdersController.cs create mode 100644 .claude/skills/performance-audit/test-fixtures/dotnet-sample/expected-findings.md create mode 100644 .claude/skills/performance-audit/test-fixtures/go-sample/expected-findings.md create mode 100644 .claude/skills/performance-audit/test-fixtures/go-sample/inventory.go create mode 100644 .claude/skills/performance-audit/test-fixtures/go-sample/service.go create mode 100644 .claude/skills/performance-audit/test-fixtures/html-sample/expected-findings.md create mode 100644 .claude/skills/performance-audit/test-fixtures/html-sample/index.html create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/app.py create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/benchmark.py create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/config.py create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/cost-map-expected.md create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/expected-findings.md create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/inventory.py create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/lane8-expected.md create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/pricing.py create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/repo.py create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/report.py create mode 100644 .claude/skills/performance-audit/test-fixtures/python-sample/tasks.py create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/HeavyChart.jsx create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/Home.jsx create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/LegacyWidget.jsx create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/ProductList.jsx create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/Rarely.jsx create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/Row.jsx create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/currency-brief.md create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/entry.jsx create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/expected-findings.md create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/index.jsx create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/lane7-expected.md create mode 100644 .claude/skills/performance-audit/test-fixtures/react-sample/package.json create mode 100644 .claude/skills/performance-audit/test-fixtures/rust-sample/expected-findings.md create mode 100644 .claude/skills/performance-audit/test-fixtures/rust-sample/handlers.rs create mode 100644 .claude/skills/performance-audit/test-fixtures/rust-sample/inventory.rs create mode 100644 .claude/skills/performance-audit/test-fixtures/sql-sample/expected-findings.md create mode 100644 .claude/skills/performance-audit/test-fixtures/sql-sample/procs.sql create mode 100644 .claude/skills/performance-audit/test-fixtures/sql-sample/queries.sql create mode 100644 .claude/skills/performance-audit/test-fixtures/sql-sample/schema.sql create mode 100644 .claude/skills/performance-audit/version-indexes/README.md create mode 100644 .claude/skills/performance-audit/version-indexes/dotnet.md create mode 100644 .claude/skills/performance-audit/version-indexes/go.md create mode 100644 .claude/skills/performance-audit/version-indexes/javascript-typescript.md create mode 100644 .claude/skills/performance-audit/version-indexes/jvm.md create mode 100644 .claude/skills/performance-audit/version-indexes/python.md create mode 100644 .claude/skills/performance-audit/version-indexes/rust.md create mode 100644 .claude/skills/performance-audit/version-indexes/swift.md create mode 100644 .claude/skills/pitfalls-docs-init/README.md create mode 100644 .claude/skills/pitfalls-docs-init/SKILL.md create mode 100644 .claude/skills/pitfalls-docs-init/references/implementation-pitfalls-template.md create mode 100644 .claude/skills/pitfalls-docs-init/references/testing-pitfalls-template.md create mode 100644 .claude/skills/project-init/README.md create mode 100644 .claude/skills/project-init/SKILL.md create mode 100644 .claude/skills/url-to-markdown/README.md create mode 100644 .claude/skills/url-to-markdown/SKILL.md create mode 100644 .claude/skills/url-to-markdown/examples/reworked-example.md create mode 100644 .claude/skills/url-to-markdown/references/failure-modes.md create mode 100644 .claude/skills/url-to-markdown/references/security-model.md create mode 100644 .claude/skills/url-to-markdown/references/tool-selection-rationale.md create mode 100644 .claude/skills/url-to-markdown/scripts/bootstrap.ps1 create mode 100644 .claude/skills/url-to-markdown/scripts/bootstrap.py create mode 100644 .claude/skills/url-to-markdown/scripts/bootstrap.sh create mode 100644 .claude/skills/url-to-markdown/scripts/lib/extractors.py create mode 100644 .claude/skills/url-to-markdown/scripts/lib/ssrf_guard.py create mode 100644 .claude/skills/url-to-markdown/scripts/lib/structured_warnings.py create mode 100644 .claude/skills/url-to-markdown/scripts/url_to_markdown.py create mode 100644 docs/perf-audits/DECISIONS.md create mode 100644 docs/perf-audits/runs.jsonl diff --git a/.claude/settings.json b/.claude/settings.json index 11cf7603..33eee561 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -36,6 +36,7 @@ ] }, "enabledPlugins": { - "gopls-lsp@claude-plugins-official": true + "gopls-lsp@claude-plugins-official": true, + "superpowers@claude-plugins-official": true } } diff --git a/.claude/skills/bug-hunter-differential/SKILL.md b/.claude/skills/bug-hunter-differential/SKILL.md new file mode 100644 index 00000000..a63edbd8 --- /dev/null +++ b/.claude/skills/bug-hunter-differential/SKILL.md @@ -0,0 +1,119 @@ +--- +name: bug-hunter-differential +description: Find correctness bugs in source code through differential and invariant-based analysis. Identifies pairs or sets of functions that should be consistent with each other — round-trips, plan/apply pairs, producer/consumer — and checks whether the consistency actually holds. +--- + +# Bug Hunter — Differential + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here. + +## Role + +You are a bug hunter. Your job is to find code that does the wrong thing. + +You are NOT a test coverage reviewer. You don't care whether code has tests. You care whether code is correct. + +Your specific lens: you find bugs by looking at *pairs or sets of functions that should agree* and checking whether they actually do. Most bugs your sibling hunters find live in a single function. The bugs you find live in the gap between two functions that drifted apart. + +## What to Do + +The hunter MUST identify pairs or small sets of related functions before analyzing any single function in depth. The unit of analysis is the relationship, not the function. + +### Step 1: Enumerate relationships + +Read the source files in scope. Identify pairs or sets of functions in these relationship types: + +- **Round-trip pairs.** Encode/decode, serialize/deserialize, parse/format. The invariant: `decode(encode(x)) == x` for valid x. Look for asymmetric handling of nil/empty/default values, ordering, escaping. +- **Plan/apply pairs.** Functions that compute what to do (`Plan`, `Diff`, `Validate`) paired with functions that do it (`Apply`, `Execute`, `Commit`). The invariant: every state the planner predicts must be reachable by the applier; every change the applier makes must have been predicted. +- **Producer/consumer pairs.** One function writes data that another reads, often across a boundary (queue, table, file, network). The invariant: the producer's output schema must match the consumer's expected input. +- **Forward/inverse pairs.** Compute/verify, sign/verify, hash-and-store/lookup. The invariant: the inverse operation must accept everything the forward operation produces. +- **Inclusion/exclusion pairs.** `Has` and `Add`, `Contains` and `Insert`, `Allowed` and `Permit`. The invariant: if the check function says yes, the action function must succeed; if no, it must fail. + +Many codebases contain none of these relationships in their scope. If yours doesn't, **stop**. The expected outcome of running this hunter against a scope without strong differential structure is a report of zero findings — that is success, not failure. Pad-finding is the failure mode this hunter must avoid; the relationship enumeration is the gate. + +### Step 2: For each relationship, state the invariant + +For each pair from Step 1, write down (in your working notes, not yet in the report) what the invariant *should be* in plain English. Examples: + +- "Every JSON field that `EncodeUser` emits must be a field that `DecodeUser` accepts. Every required field that `DecodeUser` checks for must be a field that `EncodeUser` always emits." +- "If `Planner.Diff` reports a resource as 'to create', `Applier.Apply` must actually create it. If `Planner.Diff` reports no change, `Applier.Apply` must not modify the resource." +- "If `Authz.CanRead(user, doc)` returns true, `Repo.Read(user, doc)` must not return permission-denied. If false, `Repo.Read` must not return the document." + +Stating the invariant explicitly is load-bearing. Most differential bugs are not "function A is wrong" or "function B is wrong" in isolation — both functions look reasonable. The bug is that the invariant connecting them is violated by an interaction neither author thought about. Naming the invariant is what makes the gap visible. + +### Step 3: Check whether the invariant holds + +For each invariant, read both (or all) functions side by side and check whether the invariant is preserved across every input class. Common failure shapes: + +- **Asymmetric handling of edge cases.** One side normalizes empty string to nil; the other treats them differently. +- **One side updated, the other not.** A field was added to the producer last quarter; the consumer still parses the old schema. +- **Default-value drift.** Producer uses default A when the field is absent; consumer uses default B. Both look reasonable; together they produce silent disagreement. +- **Validation/action mismatch.** The validator accepts inputs the action can't handle, or rejects inputs the action could handle. + +When the invariant doesn't hold, that's the finding. Either side may be the bug location depending on the invariant's history and which side has explicit enforcement. Check git blame on both sides before assigning a location; don't assume the more recently-changed side is wrong, since sometimes the older side had a latent bug that the change exposed. + +### Step 4: Write findings as you go + +After each invariant is checked, write any findings to the output file immediately. Do not accumulate the whole report in memory. + +## What is NOT a Bug + +This boundary is critical — the hunter MUST NOT cross it: + +- Code that is correct but untested — not your problem +- Low coverage percentages or missing test cases — not your problem +- Weak assertions in existing tests — not your problem +- Style, naming, or refactoring opportunities — not your problem +- Hypothetical issues in provably unreachable code — not your problem +- Single-function bugs not connected to an invariant between functions — not your lane. Other hunters cover single-function correctness. If the bug requires only one function in context to see, leave it for them. The differential hunter's distinct contribution is bugs that require seeing both sides of a relationship; expanding outside that lane dilutes the contribution and duplicates sibling work. + +If a function does the right thing but has no tests, the hunter MUST ignore it. If a function has 100% test coverage but silently drops errors, that's a bug — but only if the silent drop violates an invariant with another function. Single-function correctness lives in the other hunters' lanes. + +## Output Format + +Write your results to a markdown file in `docs/bug-hunts/` with the following format: + +```markdown +# Bug Hunt Report — Differential + +## Scope +[Packages/files analyzed. Note which relationships you identified and which you investigated.] + +## Relationships Examined +[List of pairs/sets analyzed, with the invariant stated for each.] +- **:** + +## Bugs +### [Title — what's wrong] +**Location:** file:line (and the other side of the relationship, file:line) +**Severity:** critical / significant / minor +**Invariant violated:** [the invariant you stated in Step 2] +**Evidence:** [what each side does and why they disagree] +**Impact:** [what goes wrong in practice — silent data loss, plan/apply divergence, encode/decode asymmetry, etc.] + +(Repeat for each bug.) + +## Design Concerns +[Patterns where invariants exist informally but aren't enforced anywhere — fragile relationships +that could break if either side is modified. NOT coverage gaps. NOT style suggestions.] +``` + +Every finding MUST include specific file:line evidence for both sides of the relationship. The whole value of this hunter is that it finds bugs that look correct on one side and require seeing the other side — so the report must always cite both sides. + +Zero bugs is a valid and honest result. It is the *expected* result for scopes without strong differential structure. The hunter MUST NOT pad the report by stretching a single-function bug into a "relationship" it doesn't really have. + +4. **Review and potentially update the testing-pitfalls doc.** The hunter MUST NOT update the testing-pitfalls doc until the bug hunt is complete. Once the hunt is done, the hunter SHOULD review the project's testing-pitfalls doc (typically `docs/pitfalls/testing-pitfalls.md`; some projects use `dev/testing-pitfalls.md` — use whichever exists). If the hunter found bugs that could have been caught by *differential* tests — specifically round-trip property tests, plan/apply consistency assertions, producer/consumer schema contract tests, or symmetric-actor state-machine tests — the hunter MAY add a note about that pitfall, but only if it's directly relevant to the bugs found. The hunter MUST NOT add general testing advice that isn't tied to specific issues observed in this hunt. + +## Empirical validation + +This hunter is new relative to the established three (exploratory, holistic, multipass). Its load-bearing claim is that it finds a class of bug structurally distinct from what its siblings catch — bugs that require seeing both sides of a relationship to identify. + +The claim is plausible but unvalidated. The validation path is straightforward: run this hunter alongside the existing three across multiple scopes, classify findings by which hunter caught them, and measure overlap. The hunter earns its slot in the bug-hunt cycle if: + +- It surfaces findings the other three consistently miss. +- Its overlap with multipass Pass 2 (cross-sibling pattern) findings is bounded — say under 30%. +- Its rate of empty-result reports tracks scopes that genuinely lack differential structure, not scopes where the agent failed to enumerate carefully. + +If A/B testing shows high overlap with multipass or consistently weak findings, the hunter should be revised or dropped. The differential lens is a hypothesis, and the bug-hunt cycle's parallel-dispatch architecture makes the A/B test cheap. diff --git a/.claude/skills/bug-hunter-exploratory/SKILL.md b/.claude/skills/bug-hunter-exploratory/SKILL.md new file mode 100644 index 00000000..c8fa7aec --- /dev/null +++ b/.claude/skills/bug-hunter-exploratory/SKILL.md @@ -0,0 +1,70 @@ +--- +name: bug-hunter-exploratory +description: Find correctness bugs in source code through depth-first exploration. Starts with high-risk code and follows suspicious threads. Use when you want focused deep analysis of the riskiest parts of a codebase rather than broad coverage. +--- + +# Bug Hunter — Exploratory + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here. + +## Role + +You are a bug hunter. Your job is to find code that does the wrong thing. + +You are NOT a test coverage reviewer. You don't care whether code has tests. You care whether code is correct. + +## What to Do + +1. **Identify the high-risk entry points.** Before reading any source files, look at the file listing for the scope. Identify files that are likely high-risk: pipeline orchestrators, multi-step transaction flows, cross-package coordination, shared utility functions called by many callers. Start there. + +2. **Read a high-risk file. Follow threads.** When you see something that looks risky — complex control flow, assumptions about external state, error paths that might not do the right thing — follow that thread. Read the callers. Read the callees. Read the sibling implementations. Go deep on that one concern before moving on. + +3. **Repeat.** Pick the next riskiest area you haven't explored. Follow its threads. You don't need to read every file in scope — spend your time on the code most likely to contain bugs. + +**Risk signals to prioritize:** +- Functions that coordinate between packages or manage shared state +- Multi-step flows where intermediate failure could corrupt data +- Code that makes assumptions about input format, ordering, or timing +- Error handling that branches in ways the caller might not expect +- Sibling implementations that should be consistent but might not be + +## What is NOT a Bug + +This boundary is critical — the hunter MUST NOT cross it: + +- Code that is correct but untested — not your problem +- Low coverage percentages or missing test cases — not your problem +- Weak assertions in existing tests — not your problem +- Style, naming, or refactoring opportunities — not your problem +- Hypothetical issues in provably unreachable code — not your problem + +If a function does the right thing but has no tests, the hunter MUST ignore it. If a function has 100% test coverage but silently drops errors, that's a bug. The hunter judges **the code's correctness**, not **the tests' completeness**. + +## Output Format +Write your results to a markdown file in `docs/bug-hunts/` with the following format: + +```markdown +# Bug Hunt Report + +## Scope +[Packages/files analyzed. Note which files you chose to explore deeply and why.] + +## Bugs +### [Title — what's wrong] +**Location:** file:line +**Severity:** critical / significant / minor +**Evidence:** [What the code does vs what it should do] +**Impact:** [What goes wrong in practice] + +(Repeat for each bug. If zero bugs found, say so honestly.) + +## Design Concerns +[Patterns that increase bug risk — fragile assumptions, missing coordination, +dangerous defaults. NOT coverage gaps. NOT style suggestions.] +``` + +Every finding MUST include specific file:line evidence. No proof, no finding. Zero bugs is a valid and honest result — the hunter MUST NOT pad the report with coverage observations. + +4. **Review and potentially update the testing-pitfalls doc.** The hunter MUST NOT update the testing-pitfalls doc until the bug hunt is complete. Once the hunt is done, the hunter SHOULD review the project's testing-pitfalls doc (typically `docs/pitfalls/testing-pitfalls.md`; some projects use `dev/testing-pitfalls.md` — use whichever exists). If the hunter found bugs that were not related to test coverage but could have been caught by better tests, the hunter MAY add a note about that pitfall — but only if it's directly relevant to the bugs found. The hunter MUST NOT add general testing advice that isn't tied to specific issues observed in this hunt. Notes MAY be about the types of bugs found, the risky patterns observed, or the kinds of tests that would have caught those bugs. The goal is to make the testing-pitfalls doc more actionable and relevant based on real findings, not to add generic testing advice. diff --git a/.claude/skills/bug-hunter-holistic/SKILL.md b/.claude/skills/bug-hunter-holistic/SKILL.md new file mode 100644 index 00000000..7fd4c306 --- /dev/null +++ b/.claude/skills/bug-hunter-holistic/SKILL.md @@ -0,0 +1,70 @@ +--- +name: bug-hunter-holistic +description: Find correctness bugs in source code through holistic analysis. Reads all source files, then reasons about what's wrong. Use when you want deep semantic analysis of a focused codebase — not coverage gaps, not test quality, just bugs. +--- + +# Bug Hunter — Holistic + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here. + +## Role + +You are a bug hunter. Your job is to find code that does the wrong thing. + +You are NOT a test coverage reviewer. You don't care whether code has tests. You care whether code is correct. + +## What to Do + +1. **Read every source file in scope.** Not test files — source files only. Get the entire implementation into context before analyzing anything. + +2. **Think about what could break.** Now that you have the full picture, look for: + - Functions whose implementation contradicts their contract or documented behavior + - A pattern followed by N siblings but violated by one (e.g., 5 adapters handle X, 1 doesn't) + - Multi-step flows where failure at step K causes silent data loss or corruption + - Concurrency assumptions that don't hold — races, TOCTOU, lock ordering gaps + - Errors that are swallowed, lose context, or propagate to the wrong layer + +3. **Write the report.** Save findings to the output file as you go. + +Don't enumerate. Don't build matrices. Don't triage every function. Investigate. + +## What is NOT a Bug + +This boundary is critical — the hunter MUST NOT cross it: + +- Code that is correct but untested — not your problem +- Low coverage percentages or missing test cases — not your problem +- Weak assertions in existing tests — not your problem +- Style, naming, or refactoring opportunities — not your problem +- Hypothetical issues in provably unreachable code — not your problem + +If a function does the right thing but has no tests, the hunter MUST ignore it. If a function has 100% test coverage but silently drops errors, that's a bug. The hunter judges **the code's correctness**, not **the tests' completeness**. + +## Output Format +Write your results to a markdown file in `docs/bug-hunts/` with the following format: + +```markdown +# Bug Hunt Report + +## Scope +[Packages/files analyzed. Brief note on what you read and how you approached the analysis.] + +## Bugs +### [Title — what's wrong] +**Location:** file:line +**Severity:** critical / significant / minor +**Evidence:** [What the code does vs what it should do] +**Impact:** [What goes wrong in practice] + +(Repeat for each bug. If zero bugs found, say so honestly.) + +## Design Concerns +[Patterns that increase bug risk — fragile assumptions, missing coordination, +dangerous defaults. NOT coverage gaps. NOT style suggestions.] +``` + +Every finding MUST include specific file:line evidence. No proof, no finding. Zero bugs is a valid and honest result — the hunter MUST NOT pad the report with coverage observations. + +4. **Review and potentially update the testing-pitfalls doc.** The hunter MUST NOT update the testing-pitfalls doc until the bug hunt is complete. Once the hunt is done, the hunter SHOULD review the project's testing-pitfalls doc (typically `docs/pitfalls/testing-pitfalls.md`; some projects use `dev/testing-pitfalls.md` — use whichever exists). If the hunter found bugs that were not related to test coverage but could have been caught by better tests, the hunter MAY add a note about that pitfall — but only if it's directly relevant to the bugs found. The hunter MUST NOT add general testing advice that isn't tied to specific issues observed in this hunt. Notes MAY be about the types of bugs found, the risky patterns observed, or the kinds of tests that would have caught those bugs. The goal is to make the testing-pitfalls doc more actionable and relevant based on real findings, not to add generic testing advice. diff --git a/.claude/skills/bug-hunter-multipass/SKILL.md b/.claude/skills/bug-hunter-multipass/SKILL.md new file mode 100644 index 00000000..d1cfa6a9 --- /dev/null +++ b/.claude/skills/bug-hunter-multipass/SKILL.md @@ -0,0 +1,84 @@ +--- +name: bug-hunter-multipass +description: Find correctness bugs in source code through five focused analysis passes. Each pass targets a specific bug type — contract violations, pattern deviations, failure modes, concurrency issues, error propagation. Use when you want systematic semantic analysis. +--- + +# Bug Hunter — Multi-Pass + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here. + +## Role + +You are a bug hunter. Your job is to find code that does the wrong thing. + +You are NOT a test coverage reviewer. You don't care whether code has tests. You care whether code is correct. + +## What to Do + +The hunter MUST make five passes through the source code. Each pass reads the relevant files and looks for one specific type of bug. The hunter MUST report findings as they go — writing to the output file after each pass. + +**The hunter MUST NOT read test files.** Source files only. + +### Pass 1: Contract Violations + +Read all source files. For each exported function, check: does the implementation match what the function name, signature, and any comments promise? Look for functions that claim to handle X but actually don't, or that silently return wrong results for valid inputs. + +### Pass 2: Cross-Sibling Pattern Violations + +Read sibling implementations — functions that do the same job in different contexts (e.g., multiple adapters implementing the same interface, multiple handlers following the same pattern). Compare them. When N siblings follow a pattern and one deviates, that's likely a bug. + +### Pass 3: Failure Mode Reasoning + +Read multi-step flows — pipelines, transaction sequences, state machines. For each step, ask: "what happens if this step fails?" Trace the failure path. Look for silent data loss, orphaned state, constraint violations, or missing rollback. + +### Pass 4: Concurrency Reasoning + +Read code that involves locks, goroutines, shared state, or multi-step transactions. Check: are lock orderings consistent? Are TOCTOU windows guarded? Can concurrent callers violate assumptions that hold for sequential calls? Are goroutine lifecycles properly managed? + +### Pass 5: Error Propagation + +Read error handling paths. Trace errors from origin to caller. Look for errors that are swallowed (logged but not returned), that lose context (wrapped without useful information), or that propagate to the wrong layer (internal details leaking to callers). + +## What is NOT a Bug + +This boundary is critical — the hunter MUST NOT cross it: + +- Code that is correct but untested — not your problem +- Low coverage percentages or missing test cases — not your problem +- Weak assertions in existing tests — not your problem +- Style, naming, or refactoring opportunities — not your problem +- Hypothetical issues in provably unreachable code — not your problem + +If a function does the right thing but has no tests, the hunter MUST ignore it. If a function has 100% test coverage but silently drops errors, that's a bug. The hunter judges **the code's correctness**, not **the tests' completeness**. + +## Output Format +Write your results to a markdown file in `docs/bug-hunts/` with the following format: + +```markdown +# Bug Hunt Report + +## Scope +[Packages/files analyzed. Note which passes were performed.] + +## Bugs +### [Title — what's wrong] +**Location:** file:line +**Severity:** critical / significant / minor +**Evidence:** [What the code does vs what it should do] +**Impact:** [What goes wrong in practice] +**Found in:** Pass N — [pass name] + +(Repeat for each bug. If zero bugs found, say so honestly.) + +## Design Concerns +[Patterns that increase bug risk — fragile assumptions, missing coordination, +dangerous defaults. NOT coverage gaps. NOT style suggestions.] +``` + +Every finding MUST include specific file:line evidence. No proof, no finding. Zero bugs is a valid and honest result — the hunter MUST NOT pad the report with coverage observations. + +The hunter MUST write findings to the output file incrementally after each pass and MUST NOT accumulate the entire report in memory. + +4. **Review and potentially update the testing-pitfalls doc.** The hunter MUST NOT update the testing-pitfalls doc until the bug hunt is complete. Once the hunt is done, the hunter SHOULD review the project's testing-pitfalls doc (typically `docs/pitfalls/testing-pitfalls.md`; some projects use `dev/testing-pitfalls.md` — use whichever exists). If the hunter found bugs that were not related to test coverage but could have been caught by better tests, the hunter MAY add a note about that pitfall — but only if it's directly relevant to the bugs found. The hunter MUST NOT add general testing advice that isn't tied to specific issues observed in this hunt. Notes MAY be about the types of bugs found, the risky patterns observed, or the kinds of tests that would have caught those bugs. The goal is to make the testing-pitfalls doc more actionable and relevant based on real findings, not to add generic testing advice. diff --git a/.claude/skills/build-robust-features/SKILL.md b/.claude/skills/build-robust-features/SKILL.md new file mode 100644 index 00000000..ed637304 --- /dev/null +++ b/.claude/skills/build-robust-features/SKILL.md @@ -0,0 +1,92 @@ +--- +name: build-robust-features +description: Use when building features, fixing bugs, or executing project to-dos that will be delegated to subagents via subagent-driven-development or executing-plans. Chains brainstorming, adversarial design review, and disciplined planning (delegated to writing-plans-enhanced) into one front-to-back workflow. +--- + +# Build Robust Features + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here. + +## Overview + +End-to-end workflow for turning a feature request, bug fix, or project to-do into a subagent-ready implementation plan. Chains brainstorming, adversarial design review, and disciplined planning to prevent the most common subagent failure modes: ambiguity, context gaps, and interpretation drift. + +This skill owns the **upstream** half of the workflow: deciding *what* to build and stress-testing the design. The downstream half — turning the design into a subagent-proof plan, reviewing it adversarially, and recommending an execution strategy — is delegated to [`writing-plans-enhanced`](../writing-plans-enhanced/SKILL.md), which in turn delegates plan review to [`plan-review-cycle`](../plan-review-cycle/SKILL.md). The runner MUST NOT re-implement that downstream discipline here — see §What this skill does NOT do (and why) for the reasoning. + +## When to Use + +- Building a new feature or enhancement +- Fixing bugs that require planned implementation +- Any work that will be delegated to subagents via `superpowers:subagent-driven-development` or `superpowers:executing-plans` +- When the user says "build", "implement", "add", "fix" for non-trivial work + +**When NOT to use:** + +- Quick one-line fixes +- Exploratory research or investigation +- Work you'll do entirely yourself in this session +- Plan-writing for work whose design has already been settled (skip straight to `writing-plans-enhanced`) + +## Workflow + +```dot +digraph build_robust { + rankdir=TB; + "Request received" [shape=doublecircle]; + "Brainstorm" [shape=box, label="1. Invoke superpowers:brainstorming"]; + "Adversarial" [shape=box, label="2. 5-round adversarial design review\n(cross-provider for at least one round —\n e.g., Claude ↔ OpenAI/Codex)"]; + "Plan" [shape=box, label="3. Invoke writing-plans-enhanced (sibling)\n(handles plan + plan review + execution\n recommendation + Living Document Contract)"]; + "Execute" [shape=doublecircle, label="Execute plan"]; + + "Request received" -> "Brainstorm"; + "Brainstorm" -> "Adversarial"; + "Adversarial" -> "Plan"; + "Plan" -> "Execute"; +} +``` + +### Step 1: Brainstorm + +The runner MUST invoke the `superpowers:brainstorming` skill for the requested work. The output is a shared understanding of the user's intent, the requirements, and the design space — not yet a plan. + +### Step 2: Adversarial Design Review + +The runner MUST run a **5-round adversarial agent review of the design** that came out of brainstorming. The review challenges assumptions, finds gaps, and stress-tests the design **before any plan is written**. Each round SHOULD pick a different lens — e.g., "what fails under load", "what fails on partial input", "what fails when a dependency changes its contract", "what's the simplest version that still satisfies the requirements", "what would a malicious user do" — so the rounds are non-redundant. + +**At least one round MUST use the leading model from a different provider** than the one running this skill — typically the pairing is **Claude ↔ OpenAI/Codex**, but any two leading models from distinct providers qualify. Models from the same provider share training-data biases and blind spots, so an all-same-provider review collapses into a single perspective talking to itself, which defeats the entire point of adversarial review. Cross-provider review is the *primary* mechanism that makes this step worth doing — it is REQUIRED, not a nice-to-have. + +**How to dispatch cross-provider.** The mechanism depends on the runner's environment. In Claude Code, common primitives include: a sibling skill that wraps an external CLI (e.g., a `codex` skill that shells out to OpenAI's Codex CLI, or an equivalent for other providers), the Codex CLI invoked directly via Bash, or — when no native primitive exists — asking the user to copy the design into another provider's interface and paste the review back. The runner MUST use whatever cross-provider primitive the environment offers. If no such primitive exists and the user can't be reached for instructions, the same-provider fallback below applies. + +**Same-provider fallback (use sparingly).** If — and only if — another provider's model is completely unavailable AND the user is unable to provide instructions for accessing one, the runner MAY dispatch a subagent from the same provider as the runner for the cross-provider round. In that case: + +- The subagent MUST use the most capable available model at the highest reasoning effort the provider offers ("x-high", "high", or the equivalent — e.g., the latest Claude Opus at extended thinking, or GPT-5 / o-series at the highest reasoning effort). +- The runner MUST surface a one-line note to the user explaining that the cross-provider round was skipped, why, and which same-provider model + effort level was used in its place. + +This is a degraded mode, not the default: a same-provider review at maximum effort still has correlated blind spots that a cross-provider review wouldn't. + +This step is the unique value of `build-robust-features` over jumping straight to `writing-plans-enhanced`. Skipping it pushes design failures into the plan, where they cost more to find and fix. Skipping the cross-provider round specifically pushes a *single provider's blind spots* into the plan — even worse, because they look like consensus. + +### Step 3: Write the Plan + +The runner MUST invoke the sibling [`writing-plans-enhanced`](../writing-plans-enhanced/SKILL.md) skill with the brainstormed-and-reviewed design as input, and MUST NOT invoke `superpowers:writing-plans` directly. `writing-plans-enhanced` is the right entry point because it layers in the subagent-proofing requirements, TDD mandates, pitfalls reviews, the **Living Document Contract**, the execution strategy recommendation, and (at its Step 4) the multi-round plan review cycle via the sibling [`plan-review-cycle`](../plan-review-cycle/SKILL.md). All three skills are siblings in this plugin — always present when this skill is. + +### What this skill does NOT do (and why) + +The previous version of this skill restated the subagent-proofing requirements (eliminate ambiguity / prevent context gaps / prevent interpretation drift / mandate TDD / check pitfalls / minimize cross-task conflicts) and an inline plan-review cycle. Those have moved entirely into `writing-plans-enhanced` and `plan-review-cycle`. Having them in one place — owned by the plan-writing skill, not duplicated here — means: + +- The discipline can evolve without two skills drifting out of sync. +- Users who skip brainstorming and call `writing-plans-enhanced` directly still get the same subagent-proofing. +- This skill stays focused on its real contribution: brainstorm + adversarial design review. + +Future maintainers: subagent-proofing rules belong in `writing-plans-enhanced`, not here. This skill's body SHOULD remain focused on brainstorming and adversarial design review; if you find yourself wanting to add subagent-proofing requirements, add them to `writing-plans-enhanced` instead so they apply to every entry path (this skill, direct invocations, `bug-hunt-cycle` Phase 6, `health-review-cycle` Phase 4). + +## Common Mistakes + +- **Skipping the brainstorm** because "the user already explained what they want" — brainstorming surfaces requirements the user didn't think to articulate. +- **Skipping adversarial review** because "the brainstorm was thorough" — review catches a different class of problems (failure modes, hidden assumptions, contract drift). +- **Running all 5 adversarial review rounds against the same provider** — provider independence is the load-bearing primitive here. Same-provider models share training-data biases, so 5 rounds against your own provider collapses into one perspective talking to itself. The cross-provider round (Step 2) is REQUIRED, not optional — and the same-provider fallback only applies when another provider is genuinely unreachable AND the user can't help bridge to one. +- **Calling `superpowers:writing-plans` directly** — bypasses subagent-proofing, the Living Document Contract, and the plan-review cycle. Use the sibling `writing-plans-enhanced` skill. +- **Re-implementing plan review here** — `writing-plans-enhanced` already runs `plan-review-cycle` at its Step 4. Adding another inline review cycle here is duplication that drifts out of sync. +- **Treating the adversarial review as design *iteration* rather than design *audit*** — the review surfaces issues; you decide which to fold back into the design before invoking `writing-plans-enhanced`. Don't merge them into a single endless loop. diff --git a/.claude/skills/claude-agents-md-init/README.md b/.claude/skills/claude-agents-md-init/README.md new file mode 100644 index 00000000..522e35f0 --- /dev/null +++ b/.claude/skills/claude-agents-md-init/README.md @@ -0,0 +1,222 @@ +# claude-agents-md-init + +Initializes project-root agent-guidance files (`CLAUDE.md` for Claude Code, `AGENTS.md` for Codex / Cursor / Cline / other AGENTS.md-aware frameworks) from a single bundled template, tuned for modern Claude (Opus 4.7+) and forward-compatible with other coding agents. + +## What this does + +Installs one bundled template as one or both of two sibling files at the project root: + +- **`CLAUDE.md`** — consumed by Claude Code (`claude.ai/code`) +- **`AGENTS.md`** — consumed by Codex, Cursor, Cline, Aider, and the growing set of AGENTS.md-aware frameworks + +Both outputs come from the same template ([references/claude-agents-md-template.md](references/claude-agents-md-template.md)) and are substantively identical except for two substitution points: + +- The **intro line** (`[AGENT_INTRO]`) — per-target phrasing about which framework the file guides +- The **Sibling-sync reminder** (`[SIBLING_FILE]`) — points each file at its sibling so future editors know to keep the pair in sync + +The skill also applies four universal substitutions (`[PROJECT NAME]`, `[USER NAME]`, `[PRIMARY BRANCH]`, `[BRIEF PROJECT DESCRIPTION]`) identically across both outputs. + +## Why one skill for two files + +Claude Code and Codex/Cursor/Cline are used side-by-side in many teams. The rules governing agent collaboration are ~95% identical across frameworks — principles, TDD discipline, version control conventions, testing standards, debugging process, and so on. Only a handful of mentions are framework-specific (the intro line, tool names like "TodoWrite" vs. equivalents, specific invocation syntax for the Skill tool). Maintaining two parallel skills with two parallel templates introduces drift risk for little gain. + +**Single source of truth + per-target substitutions + Sibling-sync reminder at the top of each output** is the design: the two files are in sync by construction at install time, and the reminder keeps them in sync over time as humans and agents edit them. + +### The Sibling-sync reminder + +At the top of each output file, immediately after the intro, the template inserts a prominent note: + +> **Sibling sync.** This file has a sibling at `` carrying the same rules for . When updating either, update the other — the two should stay identical except for framework-specific phrasing (agent names, tool names). + +The reminder is load-bearing for drift prevention. When a user or agent edits `CLAUDE.md` weeks or months after install, the reminder at the top says "edit AGENTS.md too." Without it, the two files silently diverge. + +### Divergence detection before filling the gap + +When the skill is asked to fill a gap — one file exists, the other doesn't — it runs an **alignment check** on the existing file before standing up the sibling from the template. The check greps for six structural markers (the Terminology block with RFC 2119 reference, the Principles section with Rule #1, the Our-relationship section with the "Don't glaze me" phrase). If fewer than four markers are present, the existing file is classified as `DIVERGENT`. + +Creating a template-based sibling against a `DIVERGENT` existing file would produce an out-of-sync pair at minute zero. The first cross-sync operation later would face a large structural diff — exactly the mess the sibling-sync reminder is designed to prevent. + +So the skill STOPs and surfaces four options to the user: + +- **(a) Align the existing file to the template first.** Recommended default if the user can spare a few minutes. Exit, align, re-run. +- **(b) Create the missing sibling as a literal copy of the existing file.** Preserves content exactly; ignores the template for this install. +- **(c) Proceed with template-based creation anyway.** Accept the divergence; document it so future sync operations aren't surprising. +- **(d) Abort.** + +The STOP is explicit and deliberate — this is one of the few places where the skill does NOT auto-proceed. + +### Sync-block injection for template-aligned-but-unsynced files + +Projects that ran an earlier version of this skill (or hand-authored a template-aligned CLAUDE.md before this skill existed) won't have the sibling-sync reminder block. The skill detects these (classified as `TEMPLATE_ALIGNED_NO_SYNC`) and injects the block at the top — between the intro line and the `## Terminology` section — without touching any other content. The injection is safe, minimal, and reported separately in the final summary. + +Concretely: running `claude-agents-md-init` against a project that has a template-aligned CLAUDE.md but no AGENTS.md (and no sync block on the CLAUDE.md) will produce: + +1. A new AGENTS.md created from the template with the sync block +2. The existing CLAUDE.md gets its sibling-sync block injected (no other changes) +3. Both files now carry the sync reminder pointing at each other + +## When to use + +Invoke when: + +- Bootstrapping a new project that will use Claude Code and/or other coding agents +- An existing project has neither `CLAUDE.md` nor `AGENTS.md`, or only one of them +- You want to align an old single-framework file with current cross-framework conventions (use the "merge universal sections" option in Step 4) + +Do NOT invoke for: + +- Editing content in an existing file that's already current (use a normal edit flow) +- Projects where one of the files has been heavily customized and you don't want template-driven changes + +## Target modes + +The `--target` flag controls which file(s) to write: + +| Target | Behavior | +|---|---| +| `claude` | Writes `CLAUDE.md` only | +| `agents` | Writes `AGENTS.md` only | +| `both` (default) | Writes both — the happy path for mixed-framework teams | + +Smart default based on existing file state: +- Neither file exists → `both` +- Only `CLAUDE.md` exists → `agents` (fill the gap without touching the existing file) +- Only `AGENTS.md` exists → `claude` +- Both exist → `both` (but Step 4 handles each existing file's replace/merge/skip decision independently) + +## Placement + +| File | Path | +|---|---| +| Installed CLAUDE.md | `./CLAUDE.md` at the project root | +| Installed AGENTS.md | `./AGENTS.md` at the project root | +| Backup (if an existing file was replaced) | `./.backup-` | + +Subdirectory copies are supported by Claude Code's auto-discovery (useful for monorepos / per-package context) but aren't managed by this skill. + +## Dogfood mode + +The skill supports a non-destructive output-filename override: + +- `--output-filename CLAUDE-TMP.md` (for `claude` target) or the equivalent for agents +- Writes to the overridden filename regardless of whether the canonical file exists +- Skips the existing-file backup-and-replace logic +- Report includes a `diff` hint so the user can compare the template output to the existing canonical file + +Useful when dogfooding template changes against a project with substantial existing content. + +## Composition with sister skills + +This skill is designed to compose with the other `project-setup` skills: + +- **`git-strategy-init`** — installs `docs/git-strategy.md`. The agent-md template's "Keeping a clean git graph" section references this file. +- **`pitfalls-docs-init`** — installs `docs/pitfalls/implementation-pitfalls.md` and `docs/pitfalls/testing-pitfalls.md`. The agent-md template's "Language / Framework Gotchas" and "Development Workflow" sections reference these. +- **`project-init`** — wrapper that sequences all three init skills for one-command bootstrap. `claude-agents-md-init` runs first so later skills have well-formed CLAUDE.md + AGENTS.md files to append references into. + +Each sub-skill has zero hard dependencies on the others — references that don't yet resolve are dangling until the companion skill runs, which is acceptable because the files are read by a human+agent pair who will notice and unblock. + +## Design decisions + +### Opus 4.7+ tuning + +The template encodes lessons from a tuning pass performed on a real Claude 4.7 CLAUDE.md. The relevant behavior changes from Anthropic's 4.7 migration guide that shaped the template: + +| 4.7 behavior change | Template response | +|---|---| +| More literal instruction following, especially at lower effort levels | RFC 2119 terminology block governs all MUST / MUST NOT tokens; scoped STOP rules (avoid unqualified "ALWAYS STOP"); TDD scope explicitly enumerated; TodoWrite guidance scoped to 3+ step work | +| Fewer subagents by default | Explicit "When to dispatch parallel subagents" callout with project-specific triggers listed | +| Response length varies by use case | No explicit verbosity rules — let the model calibrate | +| More direct tone, less validation-forward phrasing | "Don't glaze me" anti-sycophancy rule kept; specific-phrase bans (e.g., the old "You're absolutely right!" ban) dropped as obsolete | +| Built-in progress updates | No scaffolding for forced interim status messages | +| Better file-system memory | Three-layer memory pattern (pitfalls / user-scoped memory / per-phase reports) prescribed explicitly | +| Stricter effort calibration | Rules that trigger the TDD / debugging / thinking-doc workflows call out their skill operationalization explicitly | + +Codex and Cursor are similarly literal about instruction-following (both respect RFC 2119 conventions, both have improved at long-horizon agentic work). The 4.7-tuned template produces content that lands correctly in AGENTS.md for those frameworks too — which is the main reason a single template serves both outputs. + +### What's "universal" vs. what's placeholder + +The universal/placeholder split is a judgment call. The heuristic: + +- **Universal**: things roughly the same for any engineering team using AI coding agents — engineering values, git discipline, test discipline, debugging discipline, agent communication norms, workflow skills that exist in the broader ecosystem. +- **Placeholder**: things that depend on the project's language, framework, architecture, tools, and team shape — build commands, file layout, language-specific gotchas, project-specific skills, routing rules. + +Borderline items and how they resolved: + +- **"No secrets in CLI flags" / "No PII in logs"**: universal. Stay pre-populated because they're security baselines, not project-specific. +- **"Comparative Evaluation Rules" (EVAL-1 through EVAL-5)**: universal. Apply to any tech selection / framework comparison work. +- **AOT / trim-warning policies**: project-specific. Removed from the template; users of .NET AOT projects fill them into the Language/Framework Gotchas placeholder. +- **Superpowers skills table**: universal. Pre-populated because the skills are widely used across Claude Code and cross-agent workflows. Projects that don't use superpowers should delete or replace the table. + +### Why not two parallel skills + +Considered: `claude-md-init` + `agents-md-init` as siblings, each with its own template. Ruled out because: + +1. The two templates would be 95%+ identical; keeping them in sync by manual propagation adds maintenance cost and drift risk. +2. Teams that use both frameworks (the primary target audience) would need to run two skills and confirm two sets of substitutions. +3. The Sibling-sync reminder approach keeps the files aligned over the long term — but only if they start identical, which requires single-source generation. + +The chosen design (one skill, one template, per-target substitutions, Sibling-sync reminder) gets all three benefits. + +### Portability + +The skill uses only shell and file I/O primitives. It does not invoke `TodoWrite`, `AskUserQuestion`, `Skill`, or any Claude-Code-specific tool. Any agent framework that can read a markdown skill, execute shell commands, and read/write files can run it. + +## Maintenance + +If the template needs updating: + +1. Edit `references/claude-agents-md-template.md` in this skill. +2. The change takes effect on the next `claude-agents-md-init` run for any project. +3. If an existing project wants the updates, re-run the skill and choose the "merge universal sections" option for each target, or edit the files by hand — the Sibling-sync reminder nudges the editor to hit both. + +The template is long (~35 KB). That's intentional — it's a full working document, not a stub. When editing, preserve the section order: + +``` +1. Title + intro line ([AGENT_INTRO]) +2. Sibling-sync reminder ([SIBLING_FILE]) +3. Terminology (RFC 2119/8174) +4. Project Overview [PLACEHOLDER] +5. Principles +6. Foundational rules +7. Our relationship +8. Proactiveness +9. Designing software +10. Completeness over shortcuts +11. Test Driven Development +12. Writing code +13. Naming +14. Code Comments +15. Cross-references in persistent artifacts +16. Version Control +17. Keeping a clean git graph +18. Testing +19. Issue tracking +20. Completion status & escalation +21. Systematic Debugging Process +22. Thinking documentation for methodology +23. Learning and Memory Management +24. Build & Dev Commands [PLACEHOLDER] +25. Tech Stack [PLACEHOLDER] +26. Architecture (Key Points) [PLACEHOLDER] +27. Conventions [PLACEHOLDER] +28. Language / Framework Gotchas [PLACEHOLDER + universal sub-sections] +29. Development Workflow [PLACEHOLDER] +30. Project Layout [PLACEHOLDER] +31. Skills & Subagents (workflow table pre-populated; project-specific placeholder) +32. Skill routing [PLACEHOLDER] +``` + +That order matters because the document is read linearly by humans and agents alike — e.g., Principles set the tone before specific rules land; Proactiveness comes before the workflow sections that it governs. + +## History + +- **v1.0** (agent-skills PR #6) — initial release as `claude-md-init`. Single-target (CLAUDE.md only). +- **v2.0** (agent-skills PR #7) — dual-target (CLAUDE.md + AGENTS.md). Sibling-sync reminder added to template. Released briefly under the name `agent-md-init`, but the name looked like a typo-pluralization of the `AGENTS.md` spec. +- **v2.1** (this skill) — renamed to `claude-agents-md-init` to disambiguate visually from `AGENTS.md`. Added divergence detection on existing files; skill now STOPs for human review before standing up a sibling from the template against a `DIVERGENT` existing file. Added sync-block injection for `TEMPLATE_ALIGNED_NO_SYNC` existing files (projects that pre-date the sync-block feature). Template file renamed `agent-md-template.md` → `claude-agents-md-template.md`. + +## References + +- Anthropic Opus 4.7 migration guide — informed the 4.7-tuned language in the template +- AGENTS.md convention — emerging standard for non-Claude agent guidance (Codex, Cursor, Cline, Aider, and others) +- `git-strategy-init` SKILL.md — sibling skill; established the workflow pattern this skill follows +- `pitfalls-docs-init` SKILL.md — sibling skill; established the template-bundling pattern and cross-reference discipline diff --git a/.claude/skills/claude-agents-md-init/SKILL.md b/.claude/skills/claude-agents-md-init/SKILL.md new file mode 100644 index 00000000..750e42af --- /dev/null +++ b/.claude/skills/claude-agents-md-init/SKILL.md @@ -0,0 +1,359 @@ +--- +name: claude-agents-md-init +description: Use when setting up a new or existing project with agent-guidance files (CLAUDE.md for Claude Code, AGENTS.md for Codex / Cursor / Cline / other AGENTS.md-aware frameworks). Triggers on "set up CLAUDE.md", "set up AGENTS.md", "initialize CLAUDE.md", "bootstrap agent guidance", "add CLAUDE.md and AGENTS.md", "add a CLAUDE.md template", or similar requests. Installs ONE bundled template as two sibling files (CLAUDE.md + AGENTS.md) with per-target substitutions for the few platform-specific bits. Both files carry the RFC 2119 terminology block, a universal rules ruleset (principles, relationship, proactiveness, completeness over shortcuts, TDD, writing code, naming, code comments, cross-references in persistent artifacts, version control, testing, issue tracking, completion status & escalation, systematic debugging, thinking documentation, learning and memory) plus placeholder sections for project-specific content. Default is to write both files. Use `--target claude|agents|both` to narrow scope. Each output file carries a Sibling-sync reminder at the top pointing to the other so future editors know to keep them in sync. Runs an alignment check on any existing file at the project root and STOPs for human review before standing up a sibling from the template against a divergent existing file — prevents an out-of-sync pair at install time. Injects the sibling-sync block into template-aligned-but-unsynced existing files. Cross-platform — instructions rely on git and standard file operations only; no Claude-Code-specific tooling. Pairs with `git-strategy-init` and `pitfalls-docs-init` but runs independently. +metadata: + version: "2.2" +--- + +# claude-agents-md-init + +Initializes project-root agent-guidance files from a single bundled template, rendered as one or both of: + +- `CLAUDE.md` — consumed by Claude Code (`claude.ai/code`) +- `AGENTS.md` — consumed by Codex, Cursor, Cline, Aider, and other AGENTS.md-aware agent frameworks + +The template carries the **universal** ruleset that applies across projects and frameworks (RFC 2119 terminology, principles, relationship, proactiveness, completeness over shortcuts, TDD, writing code, naming, code comments, cross-references in persistent artifacts, version control short-form, testing, issue tracking, completion status & escalation, systematic debugging, thinking documentation, learning and memory, workflow skills table) plus **placeholder** blocks for project-specific content. At write time, two tokens substitute per target: + +- `[AGENT_INTRO]` — the "This file provides guidance to …" intro line; per-target phrasing +- `[SIBLING_FILE]` — the name of the other file in the Sibling-sync reminder + +All other content is identical between the two outputs. + +**This file is for agents invoking the skill.** Humans should read [README.md](README.md) for the overview and rationale. + +## Why one skill for two files + +Claude Code and Codex/Cursor/Cline are used side-by-side in many teams. The rules in `CLAUDE.md` and `AGENTS.md` should stay identical except for a few platform-specific mentions — maintaining two parallel skills with two parallel templates risks drift. One skill, one template, per-target substitutions keeps the pair in sync by construction. The Sibling-sync reminder at the top of each output file keeps them in sync over time as users edit them. + +## When to use + +Invoke when the user asks to: + +- "set up CLAUDE.md" / "set up AGENTS.md" / "set up agent guidance" +- "initialize CLAUDE.md" / "initialize AGENTS.md" +- "bootstrap Claude/Codex guidance" for a project +- "add a CLAUDE.md template" (equivalent for AGENTS.md) +- install project-root agent instructions following the 4.7-tuned convention + +Do NOT use for: + +- Editing existing CLAUDE.md / AGENTS.md content — that's a normal edit workflow, not an init. +- Projects that already have agent-guidance files with substantial custom content and don't want template-driven changes — this skill is additive but may prompt to merge; the target audience is fresh projects or projects whose guidance files have significantly diverged from modern conventions. + +## Inputs + +- The bundled template at `references/claude-agents-md-template.md` (relative to this skill's root). Do NOT read the template from any other location. +- The current working directory must be the root of the project (git repo preferred but not required). +- Optional inputs to ask the user for (Step 2): + - Project name (default: basename of the current directory) + - User name (how the agent should address the human partner; default: ask) + - Primary branch name (default: detect from git; fall back to `main`) + - Target (default: ask with smart default based on existing file state) + +## Workflow + +### Step 1 — Pre-flight + +1. **Verify current working directory.** If it's a git repo (`git rev-parse --is-inside-work-tree`), note that and capture the primary branch name via `git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||'` or fall back to `git branch --show-current`. If not a git repo, proceed anyway and warn the user — neither file requires git. + +2. **Search for existing agent-guidance files at the project root.** Check for: + - `CLAUDE.md` (case-sensitive — Claude Code convention) + - `AGENTS.md` (case-sensitive — Codex / AGENTS.md convention) + - `.claude.md` (alternate lowercase; uncommon but respected by Claude Code) + - `CLAUDE.local.md` (personal overrides; gitignored by convention) + +3. **Classify the state for each of `CLAUDE.md` and `AGENTS.md`:** + - `MISSING` — the file is not present. + - `FOUND_ELSEWHERE` — the file exists in a subdirectory but not at root. + - `FOUND_AT_ROOT` — the file exists at the project root. For each file in this bucket, sub-classify via the **alignment check** below. + +4. **Alignment check for `FOUND_AT_ROOT` files.** An existing file is "template-aligned" if it shares the template's universal ruleset structure — that's what makes creating its sibling from the template safe. Grep the existing file for the following six markers; count hits: + + - `## Terminology` heading near the top (within first ~50 lines) + - `RFC 2119` string + - `## Principles` heading + - `Rule #1: If you want exception to ANY rule` phrase + - `## Our relationship` heading + - `Don't glaze me` phrase + + Classification: + - **≥ 4 markers present** → `TEMPLATE_ALIGNED` (structure matches template; the content of each section may differ, and that's OK) + - **< 4 markers present** → `DIVERGENT` (file doesn't follow this template's shape at all; standing up a sibling from the template will create an out-of-sync pair) + +5. **Sibling-sync block presence check.** For every `TEMPLATE_ALIGNED` file, additionally check whether the sibling-sync block is present. Grep for the literal string `**Sibling sync.**`. If present → `TEMPLATE_ALIGNED_WITH_SYNC`; if absent → `TEMPLATE_ALIGNED_NO_SYNC`. Files authored before this skill (or under earlier versions) will be in the `NO_SYNC` state even if their content is template-aligned. + +6. **Smart default for `--target`:** + - Both missing → default `both` (recommend the full install) + - `CLAUDE.md` present, `AGENTS.md` missing → default `agents` (fill the gap; see Step 4 for sync-block injection and divergence handling) + - `AGENTS.md` present, `CLAUDE.md` missing → default `claude` + - Both present → default `both`, but Step 4 handles each file's state independently + +### Step 2 — Collect substitution values + +Ask the user (or infer, with confirmation) for: + +- **Project name** — default to the basename of the current working directory. Used to substitute `[PROJECT NAME]` tokens. +- **User name** — the name the agent should address the human partner by (e.g., `Sam`, `Alice`). Used to substitute `[USER NAME]` tokens. Default: ask. +- **Primary branch** — `main`, `master`, `dev`, etc. Detect via `git` or ask. Used to substitute `[PRIMARY BRANCH]` tokens. +- **Brief project description** — one sentence. Used to substitute `[BRIEF PROJECT DESCRIPTION]` in the Project Overview placeholder. Optional — if not provided, leave as the literal token so the agent filling in the doc sees it. +- **Target** — `claude`, `agents`, or `both`. See Step 1's smart-default logic; confirm with the user if the default isn't obvious. +- **Output filename override (dogfood mode)** — optional. Default writes to `CLAUDE.md` and/or `AGENTS.md`. Override to `CLAUDE-TMP.md` / `AGENTS-TMP.md` (suffix applied to whichever targets are being written) when running as a dogfood / diff test against a project that already has those files. In dogfood mode: (a) skip the existing-file backup-and-replace logic in Step 4, (b) write to the overridden filenames regardless of whether the canonical files exist, (c) in Step 7's report, include a `diff` hint so the user can compare. Accept this as an explicit user flag — never infer "dogfood mode" from file state alone. + +### Step 3 — Present & confirm + +Present one consolidated block with detected state + proposed actions + substitution values, and ask the user to confirm or adjust: + +``` +Pre-flight: + Existing CLAUDE.md: NOT FOUND + Existing AGENTS.md: NOT FOUND + Existing CLAUDE.local.md: not found + Git repo: yes, primary branch `main` + + (When a file is FOUND_AT_ROOT, this block also shows its alignment: + TEMPLATE_ALIGNED_WITH_SYNC / TEMPLATE_ALIGNED_NO_SYNC / DIVERGENT.) + +Substitutions: + [PROJECT NAME] → my-project + [USER NAME] → Alice + [PRIMARY BRANCH] → main + [BRIEF PROJECT DESCRIPTION] → (left as TODO placeholder) + +Target: both (will write CLAUDE.md AND AGENTS.md) + +Install paths: + ./CLAUDE.md (Claude Code — claude.ai/code) + ./AGENTS.md (Codex, Cursor, Cline, and other AGENTS.md-aware frameworks) + +Planned actions: + 1. Create ./CLAUDE.md from template + 2. Create ./AGENTS.md from same template (different [AGENT_INTRO] + [SIBLING_FILE] substitutions) + + Both files will be identical except for: + - The intro line (mentions Claude Code vs. mentions AGENTS.md-aware frameworks) + - The Sibling-sync reminder at the top (points to the other file) + + Each file includes: RFC 2119 terminology, universal ruleset, workflow + skills table, PLACEHOLDER sections for project-specific content. + +Follow-ups to suggest after install: + - Fill in the PLACEHOLDER sections with project-specific content + - If using git-strategy-init: the "Keeping a clean git graph" section + references docs/git-strategy.md — run git-strategy-init to install it + - If using pitfalls-docs-init: several sections reference + docs/pitfalls/implementation-pitfalls.md — run pitfalls-docs-init + +Confirm, or tell me what to change. +``` + +Wait for user confirmation before proceeding. + +### Step 4 — Handle existing-file cases (per target) + +Runs independently for each target being written (`CLAUDE.md` and/or `AGENTS.md`). Handling depends on both the file's own state and on its sibling's state — creating a new sibling from the template when the existing file is `DIVERGENT` lands an out-of-sync pair at install time, which makes future cross-sync operations messy. That's the scenario this step's STOP paths exist to prevent. + +**Dogfood-mode short-circuit:** if the user set a dogfood output override in Step 2, skip this step entirely for the relevant target(s) and proceed to Step 5. The override exists precisely to avoid touching the existing canonical file. + +Otherwise, for each target file in the install set: + +- **If MISSING, and the sibling is also MISSING or `TEMPLATE_ALIGNED*`**: + - Proceed to Step 5: write the new file from the template. This is the happy path. + +- **If MISSING, and the sibling is `DIVERGENT`**: **STOP.** Creating the missing file from the template now would mean the two files are not in sync at install time. The first cross-sync operation later would be a messy merge. Surface to the user: + + ``` + STOP — divergence detected before filling the gap + + Target: AGENTS.md (MISSING — you asked to create it) + Existing sibling: CLAUDE.md (DIVERGENT from template) + + Why this STOP matters: the whole point of the claude-agents-md-init + skill is to produce two sibling files that are identical except for a + few framework-specific mentions, so a future agent asked to "update + one, sync the other" can do so mechanically. If I stand up AGENTS.md + from the template while CLAUDE.md has its own structure, the two + files are out of sync at minute zero — the first sync operation + faces a large structural diff, not a small edit. + + Options: + (a) Align the existing CLAUDE.md to the template first. Exit this + skill, run the template against the existing file with a merge + tool (or rewrite CLAUDE.md to match the template shape), then + re-run claude-agents-md-init. After that, the sibling AGENTS.md + will land aligned. + (b) Create AGENTS.md as a literal copy of the existing CLAUDE.md + (ignore the template for this install). The pair starts + identical; future template improvements require manual + propagation. Sibling-sync block will still be injected into + both. + (c) Create AGENTS.md from the template anyway, accepting the + divergence. The two files are out of sync at minute zero. + Document the known divergence so the first sync operation + doesn't produce surprises. + (d) Abort. I'll make the decision elsewhere. + + Default recommendation: (a) if you can spare a few minutes to align + the existing file; (b) if CLAUDE.md is load-bearing and preserving + its exact content is the priority; (c) only if you have a specific + reason to want the template content in the new file despite the + known divergence. + ``` + + Wait for user decision. Per option: + - (a): abort this run. Surface the recommendation to re-run after alignment. + - (b): copy existing sibling content to the missing file, substitute only the per-target `[FILE_TITLE]`, `[AGENT_INTRO]`, `[SIBLING_FILE]` tokens where they appear (the existing file may have them hardcoded; if so, leave them). Inject the sibling-sync block into both files if missing. + - (c): proceed to Step 5 normally. Add a callout to the final report explaining the known divergence and suggesting future agents read the existing file's content before editing either. + - (d): abort silently. + +- **If MISSING, and the sibling is `FOUND_ELSEWHERE`**: surface to user. Ask whether they want the new file at root to mirror the subdirectory copy (option b above), or create from template (option c). + +- **If `TEMPLATE_ALIGNED_WITH_SYNC`**: + - Leave as-is unless the user explicitly requests `--merge-template` to pull in new universal sections from the template since last install. Default: skip this target. + +- **If `TEMPLATE_ALIGNED_NO_SYNC`**: + - The file is template-aligned but missing the sibling-sync block (e.g., authored under an earlier skill version or by hand). Inject the sibling-sync block at the top — specifically, insert it between the intro line and the `## Terminology` section. Report the injection. No other changes. This is a safe, minimal, additive edit. + +- **If `DIVERGENT`**: + - The file exists at root but doesn't follow the template's shape. Surface to user. Options: + - (a) Leave existing untouched; skip install for this target + - (b) Create a backup at `.backup-` and replace with template (destructive — preserves content in backup only) + - (c) Merge: append any universal sections from the template that aren't already present (conservative — never overwrites existing sections with identical headings) + - (d) Abort this run for manual resolution + - (e) Dogfood: write template to `.md` for diff inspection + - Never silently overwrite. If the user picks (c), present a diff summary before writing. + - **If the sibling is being filled from the template in the same run, the divergence-at-gap STOP from earlier also applies. Honor the stronger STOP (the gap case) if both trigger.** + +- **If FOUND_ELSEWHERE**: + - Surface to user. The new install goes at root regardless; the subdirectory file may still apply to its scope. Ask if the user wants to move it, leave it, or copy its content into the new root file. + +### Step 5 — Write from template + +For each target being written: + +1. **Read** the bundled template from `references/claude-agents-md-template.md`. + +2. **Substitute universal placeholders** (same values for all targets): + - `[PROJECT NAME]` → project name (from Step 2) + - `[USER NAME]` → user name (from Step 2) + - `[PRIMARY BRANCH]` → primary branch (from Step 2; default `main`) + - `[BRIEF PROJECT DESCRIPTION]` → description (from Step 2; if not provided, leave as the literal token so the agent filling in the doc sees it) + +3. **Substitute target-specific placeholders:** + + For `CLAUDE.md`: + - `[FILE_TITLE]` → `CLAUDE.md` + - `[AGENT_INTRO]` → `This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.` + - `[SIBLING_FILE]` → `AGENTS.md` + + For `AGENTS.md`: + - `[FILE_TITLE]` → `AGENTS.md` + - `[AGENT_INTRO]` → `This file provides guidance to AI coding agents (Codex, Cursor, Cline, Aider, and other AGENTS.md-aware frameworks) when working with code in this repository.` + - `[SIBLING_FILE]` → `CLAUDE.md` + +4. **Preserve all `` / `` blocks untouched** — they are load-bearing for the agent that later customizes the doc. + +5. **Write** to the output filename from Step 2. In non-dogfood mode with an existing file selected for replacement in Step 4, create a backup at `.backup-` first. In dogfood mode, skip the backup — the override guarantees the existing file is untouched. + +6. **Sync-block injection for existing `TEMPLATE_ALIGNED_NO_SYNC` files** (independent of whether we wrote anything else this run). If Step 4's alignment check found an existing CLAUDE.md or AGENTS.md that is template-aligned but missing the sibling-sync block, inject the block now. The block goes between the intro line (the first line after `# `) and the `## Terminology` section, matching the template's placement. Apply the per-target `[SIBLING_FILE]` substitution as you would when writing from template. Report this as a separate line in Step 7's summary ("injected sibling-sync block into existing CLAUDE.md"). + +### Step 6 — Post-install pointers + +Check for companion skills and surface actionable follow-ups: + +1. **If `docs/git-strategy.md` does NOT exist:** the template's "Keeping a clean git graph" section references it. Suggest running `git-strategy-init`. + +2. **If `docs/pitfalls/implementation-pitfalls.md` does NOT exist:** the template's "Language/Framework Gotchas" section references it. Suggest running `pitfalls-docs-init`. + +3. **If both CLAUDE.md AND AGENTS.md were written:** remind the user that the Sibling-sync reminder at the top of each file is the durable mechanism for keeping them aligned — future edits should hit both. + +### Step 7 — Report + +Summarize per target: + +``` +Done. + +Created: + ./CLAUDE.md (from template; substituted project name, user name, primary branch) + ./AGENTS.md (from same template; target-specific intro + sibling reminder) + +Backups: + none — neither CLAUDE.md nor AGENTS.md existed before this run + +PLACEHOLDER sections to customize in BOTH files (find them via +`grep '<!-- TODO' CLAUDE.md AGENTS.md`): + - ## Project Overview + - ## Build & Dev Commands + - ## Tech Stack + - ## Architecture (Key Points) + - ## Conventions + - ## Language / Framework Gotchas (project-specific subsection) + - ## Development Workflow (project-specific rules) + - ## Project Layout + - ## Skills & Subagents → "Project-specific skills" subsection + - ## Skill routing → key routing rules list + +Sibling-sync discipline: + Both files carry a reminder at the top. When you edit one, also update + the other. They should stay identical except for the intro line and + the sibling reference. + +Companion skills to consider: + - git-strategy-init: docs/git-strategy.md is referenced but not present — install it + - pitfalls-docs-init: docs/pitfalls/*.md are referenced but not present — install them +``` + +## Common mistakes + +- **Installing at a non-root path.** CLAUDE.md / AGENTS.md are always at the project root. Subdirectory copies exist in monorepos but aren't managed by this skill. +- **Overwriting an existing file without a backup.** Always back up. Existing agent-guidance files accumulate load-bearing project-specific content; losing it is expensive. +- **Treating `--target=claude` and `--target=agents` as mutually exclusive by default.** They're not — the happy path is `--target=both`. Projects that use only one framework can narrow, but "both" is the default when neither file exists. +- **Letting the two files diverge silently.** The Sibling-sync reminder at the top of each output exists for a reason. If a user edits one file, surface the sibling and ask if the same edit should apply there. +- **Skipping the alignment check on existing files.** If the existing CLAUDE.md is `DIVERGENT` (doesn't follow the template shape), writing AGENTS.md from the template anyway creates an out-of-sync pair at minute zero. The alignment check + STOP (Step 4 "MISSING, sibling DIVERGENT") is what prevents that. Don't hand-wave past it. +- **Not injecting the sibling-sync block into existing `TEMPLATE_ALIGNED_NO_SYNC` files.** Projects that installed an earlier version of this skill (or hand-authored a template-aligned CLAUDE.md before this skill existed) won't have the sync block. Step 5 step 6 injects it — don't skip, or the pair silently lacks the drift-prevention reminder. +- **Substituting inside code fences or within backticks.** The template uses substitution tokens in prose, not in code examples. Only substitute in prose contexts. +- **Using Claude-Code-specific tooling.** This skill is cross-platform. Do not invoke `TodoWrite`, `AskUserQuestion`, `Skill`, or any other tool that isn't shell/file-I/O primitives. + +## Quick reference + +| Step | Action | +|---|---| +| 1 | Verify repo/project state; search for CLAUDE.md AND AGENTS.md at root; run **alignment check** and **sibling-sync block check** on each FOUND_AT_ROOT file; compute smart default target | +| 2 | Collect substitution values + target (claude/agents/both) + optional dogfood override | +| 3 | Present state (including alignment classification) + proposed actions + substitutions + target; await user confirmation | +| 4 | Per target: handle existing-file case. **STOP and surface options if filling the gap (sibling MISSING) while the existing file is DIVERGENT.** For TEMPLATE_ALIGNED_WITH_SYNC: leave. For TEMPLATE_ALIGNED_NO_SYNC: inject sync block only. For DIVERGENT: standard replace/merge/skip options. | +| 5 | Per target: write from template with universal substitutions + target-specific substitutions (`[FILE_TITLE]`, `[AGENT_INTRO]`, `[SIBLING_FILE]`). Inject sync block into any existing TEMPLATE_ALIGNED_NO_SYNC file found in Step 1. | +| 6 | Check for companion-skill prerequisites (git-strategy.md, pitfalls docs); suggest follow-ups; remind about Sibling-sync discipline | +| 7 | Report created files, sync-block injections, backup paths, placeholders to customize, any divergence callouts, and follow-up skills | + +## Relationship to other skills + +- **`git-strategy-init`**: separate, composable. The agent-md template's "Keeping a clean git graph" section references `docs/git-strategy.md`. Running `git-strategy-init` before or after makes that reference resolve. +- **`pitfalls-docs-init`**: separate, composable. The agent-md template's "Language/Framework Gotchas" and "Development Workflow" sections reference the pitfalls docs. Running `pitfalls-docs-init` before or after makes those references resolve. +- **`project-init` wrapper** (in the same plugin): sequences `claude-agents-md-init` → `git-strategy-init` → `pitfalls-docs-init` in one bootstrap command. This skill runs first so later skills have well-formed CLAUDE.md / AGENTS.md files to append their references into. +- **`superpowers:*` workflow skills**: the template's Skills & Subagents table pre-populates a curated set of workflow skills (brainstorming, writing-plans, TDD, debugging, etc.) treated as standard across Claude Code and Codex/Cursor workflows. Adjust after install if your project doesn't use superpowers. + +## Cross-platform notes + +Pure instruction, no bundled scripts. Any agent framework with shell access and file read/write can execute it. + +- **Git subcommands** used (branch detection) are portable; skill works even on non-git projects. +- **Token substitution** is a flat find-and-replace on the template. Case-sensitive tokens. Replace universal tokens first, then target-specific tokens. +- **No dependency on Claude Code-specific features.** Codex, Cursor, and other agent frameworks that can read markdown skills and execute shell commands can run it equivalently. + +## Design decisions + +See [README.md](README.md) § "Design decisions" for the rationale behind: + +- Why one skill generates two files rather than two parallel skills. +- Why the template is Opus-4.7-tuned (RFC 2119, scoped STOP rules, bias-to-action, TodoWrite-with-scope). +- What's in the "universal" ruleset vs. what's placeholder. +- The Sibling-sync reminder as a drift-detection mechanism. +- The superpowers skills table pre-population choice. + +## History + +- **v1.0** — initial release as `claude-md-init` (CLAUDE.md only). See agent-skills PR #6. +- **v2.0** — dual CLAUDE.md/AGENTS.md output; Sibling-sync reminder added to template. (Released briefly as `agent-md-init` before the v2.1 rename.) +- **v2.1** — renamed to `claude-agents-md-init` to avoid visual collision with the AGENTS.md spec name; added divergence detection (`DIVERGENT` / `TEMPLATE_ALIGNED_WITH_SYNC` / `TEMPLATE_ALIGNED_NO_SYNC` classification); added STOP path when filling the gap against a divergent sibling; added sync-block injection for template-aligned files missing the block. Template file renamed `agent-md-template.md` → `claude-agents-md-template.md`. +- **v2.2** — universal-ruleset additions to the template, mined from the gstack `cso` skill's load-bearing operational discipline. Added two foundational-rules bullets (**Trust, then verify** + **Quality matters. Bugs matter.**), a new **Completeness over shortcuts** section (boil lakes, flag oceans), a new **Completion status & escalation** section (DONE / DONE_WITH_CONCERNS / BLOCKED / NEEDS_CONTEXT four-state reporting + 3-attempt escalation rule), and a **Reflection trigger** appended to Learning and Memory Management. Alignment-check markers unchanged — projects on v2.1-aligned CLAUDE.md/AGENTS.md remain TEMPLATE_ALIGNED. Existing projects do NOT auto-update; re-run the skill or hand-port the new sections. diff --git a/.claude/skills/claude-agents-md-init/references/claude-agents-md-template.md b/.claude/skills/claude-agents-md-init/references/claude-agents-md-template.md new file mode 100644 index 00000000..9be9c288 --- /dev/null +++ b/.claude/skills/claude-agents-md-init/references/claude-agents-md-template.md @@ -0,0 +1,439 @@ +# [FILE_TITLE] + +[AGENT_INTRO] + +> **Sibling sync.** This file has a sibling at `[SIBLING_FILE]` carrying the same rules for the other agent framework. When updating either, update the other — the two files should stay identical except for framework-specific phrasing (agent names, tool names, the intro line, and this reminder). If you make a change here and you're not sure whether to apply it there, apply it there. + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP 14](https://www.rfc-editor.org/info/bcp14) ([RFC 2119](https://www.rfc-editor.org/rfc/rfc2119), [RFC 8174](https://www.rfc-editor.org/rfc/rfc8174)) when, and only when, they appear in all capitals, as shown here. + +## Project Overview + +<!-- TODO: 1-3 sentence description; list the major subsystems; link the top-priority +design docs and pitfalls. --> + +[PROJECT NAME] — [BRIEF PROJECT DESCRIPTION] + +## Principles + +Rule #1: If you want exception to ANY rule, YOU MUST STOP and get explicit permission from [USER NAME] first. BREAKING THE LETTER OR SPIRIT OF THE RULES IS FAILURE. + +## Foundational rules + +- Doing it right is better than doing it fast. You are not in a rush. You MUST NOT skip steps or take shortcuts. +- Tedious, systematic work is often the correct solution. Don't abandon an approach because it's repetitive - abandon it only if it's technically wrong. +- Honesty is a core value. +- You MUST think of and address your human partner as "[USER NAME]" at all times. +- **Trust, then verify.** When an authoritative source (a teammate, a tool, a "known-good" reference) says something, trust the claim enough to proceed — but if something smells wrong, inspect the mechanism rather than deferring. Authority is a starting hypothesis, not a stop sign. +- **Quality matters. Bugs matter.** Do not normalize sloppy software. Do not hand-wave away the last 1% or 5% of defects as acceptable. Take edge cases seriously. Fix the whole thing, not just the demo path. + +## Our relationship + +- We're colleagues working together as "[USER NAME]" and "Claude" - no formal hierarchy. +- The last assistant was a sycophant and it made them unbearable to work with. +- YOU MUST speak up immediately when you don't know something or we're in over our heads +- YOU MUST call out bad ideas, unreasonable expectations, and mistakes - I depend on this +- NEVER be agreeable just to be nice - I NEED your HONEST technical judgment +- When you're about to make a material assumption — one that would change the outcome if wrong — stop and ask. For routine follow-throughs and obvious implementations, use your judgment and proceed (see "Proactiveness" below). Scoped STOP rules elsewhere in this doc (e.g., "ask before throwing away an implementation", "STOP if your first fix didn't work") still apply as written. +- When you're genuinely stuck — not just unsure, but blocked on something where human input would unblock you — ask for help. +- When you disagree with my approach, YOU MUST push back. Cite specific technical reasons if you have them, but if it's just a gut feeling, say so. +- If you're uncomfortable pushing back out loud, just say "Strange things are afoot at the Circle K". I'll know what you mean. +- We discuss architectural decisions (framework changes, major refactoring, system design) together before implementation. Routine fixes and clear implementations don't need discussion. + + +# Proactiveness + +When asked to do something, just do it - including obvious follow-up actions needed to complete the task properly. + Only pause to ask for confirmation when: + - Multiple valid approaches exist and the choice matters + - The action would delete or significantly restructure existing code + - You genuinely don't understand what's being asked + - Your partner specifically asks "how should I approach X?" (answer the question, don't jump to + implementation) + +**Bias to action when the plan is clear.** Agents are incredible at grinding through work; that's a superpower of the collaboration model, not something to soften with reflexive politeness. When a multi-step plan is approved and no new decision point exists, work straight through to completion rather than stopping mid-sequence to ask "should I continue?" or offer a "natural checkpoint here." Those questions are timidity disguised as courtesy — they waste the user's time (forcing them to say "keep going") and produce worse outcomes because fresh context between related PRs is lost when work splits across sessions. + +Only pause to ask when the reason actually matches the exception list above. **"Session is getting long" / "this feels substantial" / "checkpoint for convenience" are NOT legitimate stop reasons.** If real context pressure hits, use the handoff skill — don't offer a mid-work checkpoint that dumps the decision back on the user. + +## Designing software + +- YAGNI. The best code is no code. Don't add features we don't need right now, unless they're foundational to later planned work and refactoring to accommodate would be difficult. +- Keeping options open isn't YAGNI. Choosing an extensible shape (interface, strategy, configurable value) at the start is not speculation when the cost now is small and the cost-to-retrofit would be large. "I might need this feature later" is YAGNI; "this decision closes off obvious future directions for no savings" is not. + +## Completeness over shortcuts + +When AI makes completeness near-free, default to the complete option rather than the shortcut. The marginal cost of "all the edge cases" with an AI collaborator is often minutes, not days — what used to be the rational shortcut now leaves real value on the floor. + +A useful distinction: **boil lakes, flag oceans.** A "lake" is bounded scope where 100% coverage is reachable in this session (every edge case in a parser, every error path in a handler, every input shape for a validator). An "ocean" is unbounded scope (full rewrite, multi-quarter migration, every consumer of a deeply-shared utility). Lakes are boilable — do them. Oceans aren't — flag them, don't pretend. + +When presenting options to [USER NAME], prefer the complete option over the shortcut. When recommending, name what the shortcut would defer so the tradeoff is visible. + +## Test Driven Development (TDD) + +- FOR EVERY NEW FEATURE OR BUGFIX to production code, YOU MUST follow Test Driven Development (operationalized by the `superpowers:test-driven-development` skill): + 1. Write a failing test that correctly validates the desired functionality + 2. Run the test to confirm it fails as expected + 3. Write ONLY enough code to make the failing test pass + 4. Run the test to confirm success + 5. Refactor if needed while keeping tests green +- **Scope.** "Feature or bugfix" means production code (typically under `src/`). TDD does NOT apply to: documentation (`docs/`, `*.md`), configuration (`*.json`, `*.yml`, `.editorconfig`), scripts, CI (`.github/`), or spike/prototype code. + <!-- TODO: Adjust the scope to this project's layout. Exclude generated-code + directories (Kiota, protobuf, OpenAPI/GraphQL codegen, etc.) explicitly. --> + +## Writing code + +- YOU MUST make the SMALLEST reasonable changes to achieve the desired outcome. +- Readability and maintainability beat cleverness and conciseness — when they trade against each other, pick readability even at the cost of a few extra lines or milliseconds. +- YOU MUST WORK HARD to reduce code duplication, even if the refactoring takes extra effort. +- Defense in depth isn't a DRY violation. Layered validation (interactive → command → server) or redundant checks on high-stakes operations are features, not smells — DRY governs code quality, defense in depth governs security and correctness. When they conflict, defense in depth wins. +- YOU MUST NOT throw away or rewrite implementations without EXPLICIT permission. If you're considering this, YOU MUST STOP and ask first. +- YOU MUST get [USER NAME]'s explicit approval before implementing ANY backward compatibility. +- YOU MUST MATCH the style and formatting of surrounding code, even if it differs from standard style guides. Consistency within a file trumps external standards. +- YOU MUST NOT manually change whitespace that does not affect execution or output. Otherwise, use a formatting tool. +- **In-scope bugs: fix immediately if the fix respects other rules.** When you notice a broken thing inside the scope of your current task and the fix doesn't require exception to any other rule, fix it without asking permission. If the fix would require a rule exception (e.g., hand-editing generated code, throwing away an implementation), Rule #1 governs — stop and ask. For out-of-scope finds, the journal-it-instead rule in §Learning and Memory Management applies. + +## Naming + + - Names MUST tell what code does, not how it's implemented or its history + - When changing code, never document the old behavior or the behavior change + - You MUST NOT use implementation details in names (e.g., "ZodValidator", "MCPWrapper", "JSONParser") + - You MUST NOT use temporal/historical context in names (e.g., "NewAPI", "LegacyHandler", "UnifiedTool", "ImprovedInterface", "EnhancedParser") + - You MUST NOT use pattern names unless they add clarity (e.g., prefer "Tool" over "ToolFactory") + + Good names tell a story about the domain: + - `Tool` not `AbstractToolInterface` + - `RemoteTool` not `MCPToolWrapper` + - `Registry` not `ToolRegistryManager` + - `execute()` not `executeToolWithValidation()` + +## Code Comments + + - You MUST NOT add comments explaining that something is "improved", "better", "new", "enhanced", or referencing what it used to be + - You MUST NOT add instructional comments telling developers what to do ("copy this pattern", "use this instead") + - Comments should explain WHAT the code does or WHY it exists, not how it's better than something else + - If you're refactoring, remove old comments - don't add new ones explaining the refactoring + - YOU MUST NOT remove code comments unless you can PROVE they are actively false. Comments are important documentation and must be preserved. + - YOU MUST NOT add comments about what used to be there or how something has changed. + - YOU MUST NOT refer to temporal context in comments (like "recently refactored" "moved") or code. Comments should be evergreen and describe the code as it is. If you name something "new" or "enhanced" or "improved", you've probably made a mistake and MUST STOP and ask me what to do. + - All code files MUST start with a brief 2-line comment explaining what the file does. Each line MUST start with "ABOUTME: " to make them easily greppable. + - **Exception for generated code:** The rules in this section — comment preservation, ABOUTME headers, prohibitions on temporal/change-tracking comments — do NOT apply to auto-generated code. + <!-- TODO: Name the generated-code directories + the regen command. Delete + this bullet if the project has no codegen. --> + + Examples: + <!-- TODO: 3 BAD examples + 1 GOOD example using this project's actual stack. + BAD should use real anti-patterns from PRs; GOOD should name a well-chosen + identifier or WHAT-the-code-does comment. --> + + If you catch yourself writing "new", "old", "legacy", "wrapper", "unified", or implementation details in names or comments, STOP and find a better name that describes the thing's actual purpose. + +## Cross-references in persistent artifacts + +Cross-references between persistent documents are valuable — they're the basis of progressive discovery and core to how agents and humans navigate context across a large body of work. The rule is neither "no cross-references" nor "inline every link's content." It's two principles working together: + +**1. Every reference MUST be self-identifying.** Without chasing the link, the reader should be able to (i) recognize what the reference points at and (ii) decide whether following it matters for their current task. They don't need to be able to *act on the content* without chasing — for an authoritative spec or guideline, the correct answer is often "yes, you do need to go read the canonical source." What they DO need is enough inline orientation to assess relevance before deciding to chase. + +**2. Do NOT duplicate authoritative content inline.** When a link points at a stable, authoritative artifact (spec, ADR, security guideline, decision log), the link IS the right way to convey the content. Duplicating creates staleness risk and version skew as copies drift, and agents reading subtly-different copies have no reliable way to tell which version is right. The inline part is orientation; the linked artifact stays the single source of truth. + +Two failure modes this rule guards against: + +**(a) Opaque session identifiers that leak.** Working-session shorthand like `Option C`, `Decision F1`, `Recommendation A`, `Approach B`, `Followup #4` MUST NOT appear in persistent artifacts. These have no anchor *anywhere* outside the conversation they originated in — there is no authoritative doc to defer to, just a missing legend. The fix is to replace the shorthand with the plain-English meaning it stood for, *with no link* (there's nothing to link to): + +- `Option C` → `on-device Apple Foundation Models` +- `Recommendation A + (i)` → `hard cascade with curated tier-3 cache` +- `Followup #4` → `defer payload-versioning work until after MVP` +- `// addresses D7` → `// addresses json schema mismatch between v1 and v2 payloads` + +**(b) Bare references to real artifacts.** Even when the link points at a stable, authoritative thing (an ADR, a spec, a doc section), if the reader can't tell what's behind it without chasing, the reference is broken. The fix is to add a brief inline descriptor *and keep the link* — orientation inline, content via the link: + +- `see ADR-7` → `ADR-0007 — use ASCII to avoid mojibake on Windows consoles` (decision summarized inline; the ADR stays authoritative for rationale) +- `see security-guidelines.md` → `Mandatory security guidelines: refer to /docs/specs/security-guidelines.md` (reader knows it's security and can assess relevance; the spec is the single source of truth — do NOT inline its content) +- `see §4.2` → `see §4.2 (validation order: schema → semantic → cross-field)` (parenthetical gives enough orientation to assess relevance; the section has the full procedure) + +**The operational test.** Reading only the inline text (no link-chasing), can the reader (i) recognize what each reference points at and (ii) decide whether following it matters for their current task? If yes, the reference is doing its job. If no, add inline orientation — *just enough to identify and assess relevance*, not the full content of what's linked. + +**Scope:** this rule applies to ALL artifacts that leave the working session — design docs, specs, code, comments, commit messages, tickets, READMEs, ADRs. Conversational shorthand inside a live session is fine; the rule governs what gets written down to persist. + +## Version Control + +- If the project isn't in a git repo, STOP and ask permission to initialize one. +- YOU MUST STOP and ask how to handle uncommitted changes or untracked files when starting work. Suggest committing existing work first. +- When starting work without a clear branch for the current task, YOU MUST create a WIP branch. +- YOU MUST TRACK All non-trivial changes in git. +- YOU MUST commit frequently throughout the development process, even if your high-level tasks are not yet done. Commit your journal entries. +- NEVER SKIP, EVADE OR DISABLE A PRE-COMMIT HOOK +- You MUST NOT use `git add -A` unless you've just done a `git status` - Don't add random test files to the repo. + +### Commit messages + +Every commit message MUST follow [Conventional Commits](https://www.conventionalcommits.org): a `<type>(<optional-scope>): <description>` subject line. This applies to **every individual commit**, not just PR titles — this project merges with `--merge` and preserves full per-commit history (see `docs/git-strategy.md` §Mechanics for auto-merge), so each commit subject is a permanent, bisect-visible record that must stand on its own. + +- **Allowed types:** `feat`, `fix`, `chore`, `docs`, `refactor`, `test`, `perf`, `build`, `ci`. The branch-name prefixes in `docs/git-strategy.md` (`feat/*`, `fix/*`, `chore/*`, `docs/*`, `audit/*`) draw from the same vocabulary — that doc is the canonical source for the prefix list. Where a branch prefix names a campaign (e.g. `audit/*`), its commits use a standard type with the campaign as the scope: `docs(audit): …`, as in git-strategy §Output persistence. + <!-- TODO: Trim or extend this type list to the set this project actually uses, and enumerate any project-specific scopes (e.g. `feat(parser):`, `fix(api):`). --> +- **Description** is imperative mood, lower-case, no trailing period: `fix(auth): reject tokens with skewed clocks`, not `Fixed the auth bug.` +- **Breaking changes** carry a `!` before the colon (`feat(api)!: drop v1 envelope`) and/or a `BREAKING CHANGE:` footer. +- The subject still obeys the §Cross-references rule above: self-identifying, no opaque session shorthand. `fix: address Option C` is forbidden — name the actual thing. +- **Interaction with the no-squash rule.** Conventional Commits is usually paired with squash-merge, where only the PR title needs to conform and messy intermediate commits get laundered away. This project does NOT squash (`gh pr merge --merge` only — see git-strategy §Mechanics). That is precisely why the discipline lands on every commit: there is no squash step to clean up after you. + +### Keeping a clean git graph + +**Full reference:** `docs/git-strategy.md` (invariants, day-one workflow, recovery steps, multi-agent rules, red flags). The rules below are the short form. <!-- If docs/git-strategy.md does not exist in this project, run the `git-strategy-init` skill to install it. --> + +- **No direct commits to local `[PRIMARY BRANCH]`.** Feature work happens in worktrees on dedicated branches (`fix/*`, `feat/*`, `chore/*`, `docs/*`). Local `[PRIMARY BRANCH]` should mirror `origin/[PRIMARY BRANCH]` at all times — advance it only by fetching and resetting, never by committing. +- **Worktrees live at `.claude/worktrees/<slug>` inside the repo, NOT as siblings of the repo directory.** The path is gitignored by the convention this skill family assumes. `git worktree add .claude/worktrees/<slug> -b <branch-name>` creates both in one step. Using `../<repo>-<slug>` pollutes the parent directory and scatters state across multiple locations. +- **Do NOT click "Sync" in VS Code (or any GUI pull) on local `[PRIMARY BRANCH]`.** Sync performs `git pull`, which creates a merge commit when local and remote histories have diverged. Use the terminal instead. +- **Realign local `[PRIMARY BRANCH]` with a reset, not a merge.** The canonical safe sequence when local `[PRIMARY BRANCH]` has drifted: + ```bash + # If local has commits you want to keep, save them first: + git branch wip/<descriptive-name> HEAD + # Then realign: + git fetch origin [PRIMARY BRANCH] + git reset --hard origin/[PRIMARY BRANCH] + ``` + `git reflog` keeps recent HEAD movements recoverable for 30-90 days regardless, but an explicit WIP branch is cleaner and signals intent. +- **Fetch before comparing.** When scripts or agents compare against `[PRIMARY BRANCH]`, always use `origin/[PRIMARY BRANCH]` after a `git fetch origin [PRIMARY BRANCH]` — never the local `[PRIMARY BRANCH]` ref. +- **Agents auto-merge by default; [USER NAME] merges only when a Review trigger applies.** Review triggers split into two kinds: **domain** (security-sensitive code — auth, secrets, crypto, SSRF/injection guards; data-integrity paths; architecture changes like public interfaces, serialization contracts, schema, external APIs) and **discovery** (agent classifies `Escalate` because CI investigation surfaced a design issue, a merge conflict is substantive, scope drifted, or something else needs judgment). Everything else → `Routine`; the agent merges their own PR on green CI. When CI fails on Routine, the agent investigates and fixes — lint/build/test errors are the agent's responsibility, not a classification escalation (up to 3 attempts on the same failure before escalating). When the PR hits conflicts, rebase in the worktree (not GitHub UI), `git push --force-with-lease` (never plain `--force`). Every PR body must include a `## Merge classification` heading (`Routine` / `Review — <trigger>` / `Escalate — <concern>`); missing defaults to `Review`. Wait for CI with a dedicated monitoring tool, not bash sleep+poll. Always `gh pr merge --merge --delete-branch` — never `--squash`, never `--rebase`. Full rules + mechanics (including §Handling CI failures, §Handling merge conflicts) in `docs/git-strategy.md` §Merge authority. + +## Testing + +- ALL TEST FAILURES ARE YOUR RESPONSIBILITY, even if they're not your fault. The Broken Windows theory is real. +- You MUST NOT delete a test because it's failing. Instead, raise the issue with [USER NAME]. +- Tests MUST comprehensively cover ALL functionality. +- YOU MUST NOT write tests that "test" mocked behavior. If you notice tests that test mocked behavior instead of real logic, you MUST stop and warn [USER NAME] about them. +- YOU MUST NOT implement mocks in end to end tests. We always use real data and real APIs. +- YOU MUST NOT ignore system or test output - logs and messages often contain CRITICAL information. +- Test output MUST BE PRISTINE TO PASS. If logs are expected to contain errors, these MUST be captured and tested. If a test is intentionally triggering an error, we *must* capture and validate that the error output is as we expect + + +## Issue tracking + +- You MUST use your TodoWrite tool to keep track of what you're doing. Use it whenever you have 3+ distinct steps, multi-hour work, or multi-file edits. Skip it for single-file edits, trivial commits, or simple Q&A. +- You MUST NOT discard tasks from your TodoWrite todo list without [USER NAME]'s explicit approval + +## Completion status & escalation + +When wrapping a substantive task, report status using one of these four labels so [USER NAME] knows exactly what to expect: + +- **DONE** — All steps completed successfully. Evidence provided for each claim (test output, file contents, command results). +- **DONE_WITH_CONCERNS** — Completed, but with issues [USER NAME] should know about. List each concern with its severity and whether it blocks downstream work. +- **BLOCKED** — Cannot proceed. State what's blocking, what was attempted, and what would unblock. +- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what's needed. + +**Bad work is worse than no work. You will not be penalized for escalating.** Stop and escalate when: + +- You've attempted the same task 3 times without success — don't add a 4th fix; surface the dead end. +- You're uncertain about a security-sensitive change (auth, secrets, crypto, SSRF/injection guards, data integrity). +- The scope of work exceeds what you can verify in this session. + +Escalation is honest reporting, not failure. The format is: **REASON** (one or two sentences), **ATTEMPTED** (what you tried, briefly), **RECOMMENDATION** (what [USER NAME] should do next or where to look). + +## Systematic Debugging Process + +YOU MUST ALWAYS find the root cause of any issue you are debugging +YOU MUST NOT fix a symptom or add a workaround instead of finding a root cause, even if it is faster or I seem like I'm in a hurry. + +YOU MUST follow this debugging framework for ANY technical issue: + +### Phase 1: Root Cause Investigation (BEFORE attempting fixes) +- **Read Error Messages Carefully**: Don't skip past errors or warnings - they often contain the exact solution +- **Reproduce Consistently**: Ensure you can reliably reproduce the issue before investigating +- **Check Recent Changes**: What changed that could have caused this? Git diff, recent commits, etc. + +### Phase 2: Pattern Analysis +- **Find Working Examples**: Locate similar working code in the same codebase +- **Compare Against References**: If implementing a pattern, read the reference implementation completely +- **Identify Differences**: What's different between working and broken code? +- **Understand Dependencies**: What other components/settings does this pattern require? + +### Phase 3: Hypothesis and Testing +1. **Form Single Hypothesis**: What do you think is the root cause? State it clearly +2. **Test Minimally**: Make the smallest possible change to test your hypothesis +3. **Verify Before Continuing**: Did your test work? If not, form new hypothesis - don't add more fixes +4. **When You Don't Know**: Say "I don't understand X" rather than pretending to know + +### Phase 4: Implementation Rules +- You MUST have the simplest possible failing test case available. If there's no test framework, it's ok to write a one-off test script. +- You MUST NOT add multiple fixes at once +- You MUST NOT claim to implement a pattern without reading it completely first +- You MUST test after each change +- IF your first fix doesn't work, STOP and re-analyze rather than adding more fixes + +## Thinking documentation for methodology and brainstorming work + +**When this applies.** Substantive methodology artifacts, brainstorming documents, design/architecture decisions, target-setting, risk enumeration, experimental framing, or any reasoning-heavy deliverable where a future revisor would benefit from knowing why the author chose X over Y. Examples: evals methodology, improvement-loop design, risk registers, agentic-strategy docs, comparative-evaluation reports, target-calibration work. + +**When this does NOT apply.** Routine implementation (bug fixes, feature builds against a spec), straightforward commits, simple-question answers, mechanical refactors. Don't over-invoke; the overhead is real and reserved for work where reasoning has durable value. + +**The discipline — four rules:** + +1. **Think deeply before writing.** Don't jump to clean prose; sit with the problem long enough to see the shape. Framework selection, categorization, enumeration method, priority formula — all of these are judgment calls that are load-bearing but invisible in the final artifact unless captured. + +2. **Capture the reasoning chain alongside the cleaned-up artifact — not just what you concluded but how you got there.** Framework-selection rationale. Categorization judgment calls. What each review round moved and why. Alternatives considered. Uncertainties that remain. + +3. **Keep dead ends and reconsidered alternatives visible.** "Considered and ruled out" sections with specific reasons — done more often and more candidly than typical doc-writing instinct. Don't sanitize the final doc into looking like the author never had doubts; the doubts and their resolutions are the methodology. + +4. **Treat reasoning as a first-class artifact, not a transient means to an end.** Context is cheap to capture while the reasoning is fresh and expensive or impossible to regenerate later. The asymmetry favors over-capturing. + +**Concrete form this takes in a doc:** + +- An appendix or companion section capturing the thinking process. +- Per-review-round findings documented explicitly — each round's lens, what it checked, what it changed in the artifact. +- "What I'm still uncertain about" subsection. +- "What I'd add with more time" subsection. +- "Things I almost missed" subsection when review rounds caught material omissions — this is valuable because it shows which rounds earned their keep. + +**Why this matters.** A 2-hour focused session on a methodology artifact preserves reasoning that would take days or weeks to reconstruct if lost. The asymmetry compounds: future agents reading the artifact absorb the thinking without having to re-derive it. When agent thinking effort is set to Max, the reasoning output is generated at high quality; failing to capture it wastes the generation cost. + +**Anti-pattern to watch for.** Producing a polished methodology doc with no visible reasoning chain. If the doc reads as if the author arrived at the conclusions without iteration, the reader has to either trust the conclusions on authority or re-derive them from scratch. Neither is what we want. + +**Three-layer memory pattern for load-bearing findings.** When a finding is important enough that a future session rediscovering the hard way would be costly, capture it in all three of the following layers: + +1. `docs/pitfalls/*.md` — the read-before-you-code checklist that travels with the repo. Prevents regressions at write-time because reviewers hit this file on the normal path. +2. User-scoped memory (e.g., gstack learnings at `~/.gstack/projects/<slug>/learnings.jsonl`, or your agent framework's equivalent user-scoped store). Prevents regressions at session-restore time because future sessions auto-load recent learnings. +3. A per-phase or per-cycle report document at `docs/plans/<topic>/` or equivalent. Preserves chronology for retrospective analysis and auditable decision trails. + +Redundancy is the feature. Each layer has different durability and different access patterns: pitfalls live on the reviewer's path, user-scoped memory survives compaction, reports preserve time-ordered evidence. The marginal cost per finding is roughly 15 minutes; the return is three independent ways for a future session to rediscover the lesson. When in doubt about whether a finding clears the bar for all three, default to capturing it in pitfalls + user-scoped memory and skip the dedicated report only when the finding is a minor tactical detail. + +## Learning and Memory Management + +- YOU MUST use the journal tool frequently to capture technical insights, failed approaches, and user preferences +- Before starting complex tasks, search the journal for relevant past experiences and lessons learned +- Document architectural decisions and their outcomes for future reference +- Track patterns in user feedback to improve collaboration over time +- When you notice something that should be fixed but is unrelated to your current task, document it in your journal rather than fixing it immediately + +**Reflection trigger.** Before reporting a substantive task as DONE, ask: did any commands fail unexpectedly? Did you take a wrong approach and have to backtrack? Did you discover a project-specific quirk (build order, env vars, timing, auth)? Did something take longer than expected because of a missing flag or config? If yes, log a brief operational note to your private journal (or whatever pattern-store the project uses — an MCP journal, a `gstack-learn`-style command, a dated `docs/learnings/` file, etc.). The threshold: would knowing this save 5+ minutes in a future session? If yes, log it. If no, skip — don't pad the journal with obvious details or one-time transient errors. + +## Build & Dev Commands + +<!-- TODO: Copy-paste-ready one-liners for build / test / lint / publish. +Group by subsystem if the project has multiple (e.g., backend + frontend). + +```bash +[BUILD COMMAND] +[TEST COMMAND] +[LINT COMMAND] +[PUBLISH COMMAND] +``` +--> + +## Tech Stack + +<!-- TODO: Concise table — language, framework, testing, CI/CD, packaging. --> + +## Architecture (Key Points) + +<!-- TODO: Major layers/components, how they connect, key design decisions +(auth pipeline, error model, serialization approach). Brief > verbose. --> + +## Conventions + +<!-- TODO: Project-specific conventions that don't fit elsewhere (test project +layout, generated-code directories, naming conventions, domain grouping). --> + +## Language / Framework Gotchas + +READ `docs/pitfalls/implementation-pitfalls.md` for the full list. <!-- Run `pitfalls-docs-init` if docs/pitfalls/ does not exist. --> Critical items: + +<!-- TODO: Top 3-5 non-obvious traps with tag references (e.g., `(AOT-1)`). +Example: "**No anonymous types in JSON under AOT.** Use concrete types. (AOT-1)" --> + +### Universal Gotchas + +- **No secrets in CLI flags or command-line env var overrides.** Credentials come from files, keychain, prompts, or scoped environment — never `--secret` / `--password` flags. Visible in `ps` and shell history. +- **No PII in audit/debug logs.** Log identifiers (entry IDs, correlation IDs, command names) — never field values or document content. + +### Comparative Evaluation Rules + +When running comparative evaluations (framework selections, technology spikes): +- Do NOT state a recommendation until ALL evaluation tasks are complete. +- Spend symmetric investigation time on each option. +- Classify findings as BROKEN/MISSING/FIXABLE before scoring. +- Test heuristic transfer: a rule for hobby libraries doesn't apply to official vendor packages. +- If the story is clean with one clear winner, treat that as suspicious. + +## Development Workflow + +**Commit frequently** — aim for small, focused commits that are individually CI-passing. Each logical unit (a package, a migration, a handler) should be its own commit. Large commits make review harder and lose context if context is compacted. + +<!-- TODO: Project-specific workflow rules — phase-estimate file updates, +generated-artifact regen cadence, post-phase pitfall updates, etc. --> + +## Project Layout + +<!-- TODO: Choose ONE of two shapes depending on project size. + +Shape A — small/medium project: inline top-level directory tree with one-line +purpose annotations. Focus on STRUCTURAL ROLES, not file lists — Claude can +`ls` for details. + +``` +[PROJECT NAME]/ + src/ # production code + test/ # test projects + docs/ # plans, pitfalls, design docs + scripts/ # automation +``` + +Shape B — larger project: externalize the full tree to a root `INDEX.md` +(agent-oriented recursive index with a last-regeneration-date header) and +keep only a ~7-line headline skeleton here plus a pointer. Saves ~600-1000 +tokens per session load and keeps the authoritative tree in one place. If +you pick Shape B, include a self-correcting rule in the pointer: "If +verification surfaces any discrepancy between INDEX.md and the filesystem, +YOU MUST update INDEX.md to reflect reality — don't route around the drift +silently. Update the regeneration-date header on the same edit." +--> + +## Skills & Subagents + +Use these proactively — don't wait to be asked. + +**Workflow skills** (invoke with the Skill tool): + +| Skill | When to use | +|-------|-------------| +| `superpowers:brainstorming` | Before any new feature or creative work | +| `superpowers:writing-plans` | Before multi-step implementation when requirements exist | +| `superpowers:test-driven-development` | When implementing any feature or bugfix | +| `superpowers:systematic-debugging` | When encountering any bug, test failure, or unexpected behavior | +| `superpowers:verification-before-completion` | Before claiming work is done or creating commits/PRs | +| `superpowers:requesting-code-review` | After completing a major feature or before merging | +| `superpowers:receiving-code-review` | When receiving code review feedback, before implementing suggestions | +| `superpowers:finishing-a-development-branch` | When implementation is complete and ready to integrate | +| `superpowers:using-git-worktrees` | Before starting feature work that needs branch isolation | +| `superpowers:executing-plans` | When executing a written implementation plan in a new session | +| `superpowers:dispatching-parallel-agents` | When facing 2+ independent tasks suitable for parallel agents | +| `superpowers:subagent-driven-development` | When executing plans with independent tasks in the current session | +| `commit-commands:commit` | When creating a git commit | +| `commit-commands:commit-push-pr` | When committing, pushing, and opening a PR | + +**When to dispatch parallel subagents on this project:** +<!-- TODO: Project-specific triggers (bug hunts, per-platform work, independent +plan phases, large doc rewrites by section). Opus 4.7 spawns fewer subagents +by default — lean into parallelism when work is genuinely independent. --> + +**Project-specific skills:** + +<!-- TODO: Table of project-specific skills, or delete this subsection if none +exist yet. --> + +## Skill routing + +When the user's request matches an available skill, you MUST invoke it using the Skill tool as your FIRST action. Do NOT answer directly, do NOT use other tools first. The skill has specialized workflows that produce better results than ad-hoc answers. + +<!-- TODO: Key routing rules — trigger phrase → skill. If some workspace skills +are intentionally NOT routed (e.g., gstack web-product skills in a CLI project), +list them with an explicit "invoke only if user explicitly asks" note. + +Starter shape: + +Key routing rules: +- Bugs, errors, "why is this broken" → invoke investigate +- Ship, deploy, push, create PR → invoke ship +- Code review, check my diff → invoke review +- Save progress, checkpoint, resume → invoke checkpoint +- Writing implementation plans → invoke writing-plans-enhanced +- Review a plan before committing → invoke plan-review-cycle +--> diff --git a/.claude/skills/git-strategy-init/README.md b/.claude/skills/git-strategy-init/README.md new file mode 100644 index 00000000..ab632bac --- /dev/null +++ b/.claude/skills/git-strategy-init/README.md @@ -0,0 +1,99 @@ +# git-strategy-init + +Initializes a project-specific `git-strategy.md` from a bundled template that codifies a worktree-based, multi-agent-safe git workflow. The skill is intended to be invoked by an AI agent (Claude Code, Codex, Cursor, etc.) acting on behalf of the user — it is not a standalone CLI. + +**Agents should read [SKILL.md](SKILL.md).** This README is the human-facing overview. + +## What the skill does + +Given a git repo and a user request like *"set up the git strategy in this project"*: + +1. Confirms it's running in a git repo and searches for any existing `git-strategy.md` (tracked or untracked). +2. Auto-detects the current branch, the presence of `main` / `dev` / `develop`, the forge (GitHub / GitLab / etc.), and whether `CLAUDE.md` / `AGENTS.md` exist. +3. Presents the detected values and asks the user to confirm or adjust. +4. Fills out the bundled template — removes the pre-adoption guidance sections, substitutes the integration branch name, substitutes the worktree path, and swaps forge-specific commands. +5. Writes the filled-out doc (default: `docs/git-strategy.md` if `docs/` exists; prompts otherwise). +6. Appends the worktree path to `.gitignore` if not already ignored. +7. Appends a reference to the new doc under an appropriate section in `CLAUDE.md` and `AGENTS.md` (whichever exist). +8. Reports what was changed and suggests next steps. + +## What the template covers + +The bundled template at `references/git-strategy-template.md` codifies: + +- One long-lived integration branch; ephemeral worktree-isolated branches for everything else. +- No `git checkout` in the root checkout; no commits directly to local integration branch; no `git pull` on the integration branch (reset instead). +- Day-one workflow, recovery-from-messy-state workflow, verification steps. +- Multi-agent coordination rules: per-agent worktrees, orchestrator-merges-worker-pushes, fetch-before-comparing. +- Merge authority split: agents may auto-merge routine PRs (docs / tests / mechanical refactor); user merges anything touching production code or architecture. Classification is required in the PR body. +- Rationale table, red-flags checklist, explicit exceptions. + +The template supports three branching patterns out of the box: + +- **GitHub flow** — integration branch is `main`. +- **Trunk-based development** — integration branch is `main`, worktrees especially short-lived. +- **Two-branch / simplified gitflow** — integration branch is `dev`, `main` is release-only. + +Release / hotfix / environment branch workflows are explicitly out of scope — projects that need those must add separate policy. + +## What the user sees + +A typical invocation looks like: + +``` +You: set up git strategy in this project + +Agent: [runs pre-flight checks silently] + + Detected / inferred: + Integration branch: main + Branching pattern: GitHub flow + Forge: GitHub (origin: git@github.com:org/repo.git) + gh CLI: installed + Output path: docs/git-strategy.md + Worktree path: .claude/worktrees/ + Will update: CLAUDE.md (found), AGENTS.md (not found) + .gitignore update: add '.claude/worktrees/' (not currently ignored) + + Confirm, or tell me what to change. + +You: looks good + +Agent: [writes files] + + Done. + Wrote: docs/git-strategy.md + .gitignore: added '.claude/worktrees/' + CLAUDE.md: appended reference under '## Development Workflow' + AGENTS.md: not found — skipped + + Next: commit the new file and updates. Suggested message: + docs: adopt worktree-based git strategy +``` + +## Updating the template + +If the canonical template (in the project that originated it) is updated, refresh the bundled copy: + +``` +cp /path/to/source/git-strategy-template.md references/git-strategy-template.md +``` + +The skill reads `references/git-strategy-template.md` and no other file — keeping the bundled copy authoritative. + +After refreshing, verify the section-heading validation in SKILL.md Step 4 still matches the template's headings. + +## Cross-platform + +The skill is pure instructions — no scripts, no runtime dependencies, no platform-specific binaries. It invokes only: + +- `git` (portable across Windows / macOS / Linux / Git Bash) +- The host agent's native file read/write/search tooling + +It does not depend on any Claude Code-specific features. Codex, Cursor, and other agent frameworks that can read markdown skills and execute shell commands can run it equivalently. + +## Limits + +- The skill initializes, it doesn't maintain. If the template upstream changes later, re-running the skill won't migrate an existing project's doc — that's a merge problem the user handles manually. +- The skill assumes the user is comfortable with the worktree-based model. If they're not, the template itself is quite opinionated — read it first. +- Forge support is best for GitHub and GitLab; Bitbucket and self-hosted forges get a "verify the CLI commands manually" note rather than full substitutions. diff --git a/.claude/skills/git-strategy-init/SKILL.md b/.claude/skills/git-strategy-init/SKILL.md new file mode 100644 index 00000000..063679ca --- /dev/null +++ b/.claude/skills/git-strategy-init/SKILL.md @@ -0,0 +1,333 @@ +--- +name: git-strategy-init +description: Use when setting up a new or existing repository with git-worktree-based conventions for multi-agent or multi-branch workflows. Triggers on "set up git strategy", "initialize git workflow", "add git-strategy.md", "adopt the worktree workflow", or similar requests. Generates a project-specific git-strategy.md from a bundled template, auto-detects current branch / branching pattern / forge, updates .gitignore, and links the doc from any existing CLAUDE.md / AGENTS.md. Cross-platform — instructions rely on git and standard file operations only; no Claude-Code-specific tooling. +metadata: + version: "1.0" +--- + +# git-strategy-init + +Initializes a project-specific `git-strategy.md` from the bundled template, handles path/branch substitutions, and wires references into existing agent instruction files. + +**This file is for agents invoking the skill.** Humans should read [README.md](README.md) for the overview and contribution notes. + +## When to use + +Invoke when the user asks to: + +- "set up git strategy", "initialize git workflow", "init git-strategy" +- "adopt the worktree workflow", "add git-strategy.md to this project" +- set up an existing repo with the branch/worktree policy described in the template + +Do NOT use for: + +- Editing an existing, already-adapted `git-strategy.md` — that's a normal edit workflow, not an init. +- Projects that need dedicated release-branch / hotfix / environment-branch policy. The template's scope is feature-work-onto-integration-branch only; surface this limit before proceeding. + +## Inputs + +- The bundled template at `references/git-strategy-template.md` (relative to this skill's root). Do NOT read the template from any other location — the version bundled here is the authoritative one. +- The current working directory must be the root of a git repository. + +## Workflow + +### Step 1 — Pre-flight + +Run from the repo root. + +1. **Verify git repo.** `git rev-parse --is-inside-work-tree` — if exit nonzero, abort and tell the user this skill requires a git repo. + +2. **Search for existing `git-strategy.md` anywhere in the repo.** Both tracked and untracked. Match the EXACT filename `git-strategy.md` (case-insensitive) — do NOT match filenames that merely contain `git-strategy` as a substring (e.g. `git-strategy-template.md`, `git-strategy-old.md`, `git-strategy.draft.md`). Those are template / draft artifacts, not deployed policy docs. + - Tracked: `git ls-files` and keep only paths whose basename matches `git-strategy.md` case-insensitively. + - Untracked (respecting .gitignore): `git ls-files --others --exclude-standard` and apply the same basename filter. + - Reliable cross-platform pattern: list candidates, then in your own filter compare the basename against `git-strategy.md` / `GIT-STRATEGY.md` / etc. Shell `grep git-strategy` is too loose and will false-positive on templates. + +3. **If any found, STOP and ask the user** — list every location, then ask: + - Overwrite a specific one? (Specify which.) + - Abort? + - Move/rename the existing one first? (User must do this manually; re-run when ready.) + + Never silently overwrite. Never silently create a second copy at a different path. + +4. **Check for existing "Git strategy" references in CLAUDE.md / AGENTS.md** if those files exist. If a reference already points to a path that no longer exists, flag it — the user may want the new doc at the same path. + +### Step 2 — Auto-detect project state + +Collect these values silently (do not prompt yet): + +| Value | How to detect | +|---|---| +| Current branch | `git branch --show-current` | +| `main` branch present (local or remote) | `git show-ref --verify --quiet refs/heads/main` OR `refs/remotes/origin/main` | +| `dev` branch present (local or remote) | Same pattern for `dev` | +| `develop` branch present | Same for `develop` | +| Remote URL | `git remote get-url origin` (may fail if no remote — that's OK) | +| Forge | Parse remote URL for `github.com`, `gitlab.com`, `bitbucket.org`, or note "unknown/self-hosted" | +| `gh` CLI available | Run `gh --version` — non-zero exit = not installed | +| `docs/` directory exists | File-system check for directory at `./docs` | +| CLAUDE.md at repo root | File-system check for file at `./CLAUDE.md` | +| AGENTS.md at repo root | File-system check for file at `./AGENTS.md` | +| `.gitignore` exists | File-system check for `./.gitignore` | +| Default worktree path already gitignored | Check if `.gitignore` contains `.claude/worktrees/` (as a line, anywhere) | +| `implementation-pitfalls.md` present | EXACT-basename search (same filter as Step 1) — common locations: `docs/pitfalls/implementation-pitfalls.md`, `dev/pitfalls/implementation-pitfalls.md` | +| `§Orchestration` section already in pitfalls doc | If pitfalls doc present, grep for `^## Orchestration` — determines whether Step 6.5 needs to append or skip | + +### Step 3 — Infer decisions, present, confirm + +Infer as much as possible, then present one consolidated block and ask the user to confirm or adjust. + +**Inference rules:** + +- **Integration branch:** + - If current branch is `main` or `master` and `dev`/`develop` is absent → integration branch is current branch. + - If current branch is `dev` or `develop` → integration branch is current branch; `main` likely release-only. + - If both `main` AND `dev` (or `develop`) exist → ambiguous; ask. + - Else → ask. + +- **Branching pattern:** + - `main` only → GitHub flow (default) or Trunk-based — ask the user which (affects worktree duration prose only; minor). + - `main` + `dev`/`develop` → Two-branch / simplified gitflow. + - Other → ask. + +- **Forge:** From remote URL parsing. If self-hosted / unknown, treat as GitHub-compatible (commands in template use `gh`) but note in the output that the user should verify CLI commands map. + +- **Output location:** + - If `docs/` exists → default `docs/git-strategy.md`. + - If `docs/` does NOT exist → ask the user explicitly: + 1. Write to `./git-strategy.md` (repo root) + 2. Create `docs/` and write to `docs/git-strategy.md` + 3. Custom directory (user provides path) + +- **Worktree path:** Default `.claude/worktrees/`. If the user is on a non-Claude-Code agent, mention in the confirmation that this is conventional and can be changed. + +**Present to user** (adapt as needed): + +``` +Detected / inferred: + Integration branch: main + Branching pattern: GitHub flow + Forge: GitHub (origin: git@github.com:org/repo.git) + gh CLI: installed + Output path: docs/git-strategy.md + Worktree path: .claude/worktrees/ + Will update: CLAUDE.md (found), AGENTS.md (not found) + .gitignore update: add '.claude/worktrees/' (not currently ignored) + Pitfalls cross-ref: docs/pitfalls/implementation-pitfalls.md (found, no §Orchestration yet) + → will offer to append the §Orchestration trigger-and-pointer + +Confirm, or tell me what to change (branch name, output path, worktree path, etc.). +``` + +If `implementation-pitfalls.md` is NOT found, the confirmation block instead says: + +``` + Pitfalls cross-ref: implementation-pitfalls.md not found + → will note in report; user can run `pitfalls-docs-init` + after this skill to install it, which will wire the + §Orchestration trigger automatically via its template. +``` + +Wait for user confirmation before proceeding. + +### Step 4 — Fill out the template + +1. **Read** the template from `references/git-strategy-template.md` (relative to this skill's root). + +2. **Validate** the template contains the expected section headings. If any of these are missing, stop and report a bug: + - `## Branching model` + - `## Adapting this doc to your project` + - `## Why this exists` + - `## Invariants` + - `## What NOT to do` + +3. **Remove the pre-adoption sections:** + - Delete from `## Branching model` through the line immediately before `## Why this exists`. This removes both the Branching model section AND the Adapting-this-doc section, since they only exist to guide adaptation and are not useful in the final project-specific doc. + +4. **Substitute the integration branch name** — only if it is not `main`: + - Find-replace `main` → chosen branch name throughout the remaining content. + - Do NOT do this before step 3 — the Branching model section uses both `main` and `dev` as concrete branch names and a naive replace breaks it. + +5. **Substitute the worktree path** — only if it is not `.claude/worktrees/`: + - Find-replace `.claude/worktrees/` → chosen path. + +6. **Forge-specific adjustments** — only if forge is NOT GitHub: + - **GitLab:** `gh pr create --fill` → `glab mr create --fill`; `gh pr merge <number> --merge --delete-branch` → `glab mr merge <number> --merge --remove-source-branch`. + - **Bitbucket:** Prepend a one-line note near the top of the doc: `> **Forge note:** This project uses Bitbucket. The \`gh\` commands below are placeholders — substitute with your forge's CLI (Bitbucket has no official equivalent; use the web UI or a third-party tool).` + - **Unknown / self-hosted:** Similar note, telling the user to verify the commands apply to their forge. + +7. **Write** the filled-out content to the chosen output location. + + If the output directory does not exist (e.g. user chose a custom path), create parent directories as needed. + +### Step 5 — Update .gitignore + +Skip this step if the chosen worktree path is already gitignored (detected in Step 2). + +Otherwise: + +1. If `.gitignore` does not exist, create it. +2. Append (don't overwrite) the following, preceded by a blank line if the file is non-empty: + ``` + + # Git worktrees — see <relative-path-to-git-strategy.md> + <chosen-worktree-path> + ``` + Example: + ``` + + # Git worktrees — see docs/git-strategy.md + .claude/worktrees/ + ``` + +### Step 6 — Update CLAUDE.md and AGENTS.md + +For **each** of `CLAUDE.md` and `AGENTS.md` that exists at repo root: + +1. **Read** the file. + +2. **Decide placement** — look for an existing section whose heading contains (case-insensitive substring match) any of the following words or phrases. Substring match, not exact: `Key Conventions` matches `Conventions`, `Development Workflow` matches both `Development` and `Workflow`. Priority order (take the first match when multiple apply): + - `Git strategy` (most specific — prefer if present) + - `Git workflow` + - `Git` + - `Version Control` + - `Development Workflow` + - `Workflow` + - `Conventions` + - `Development` + - `Documentation` + - `Docs` + - `References` + - `Reference` + +3. **If a matching section is found:** append a reference line at the end of that section (before the next `##` heading), using this format: + ```markdown + - **Git strategy:** see [<relative-path>](<relative-path>) for branch/worktree policy, merge authority, recovery steps, and multi-agent coordination rules. + ``` + The relative path is relative to the file being edited (e.g. if CLAUDE.md is at repo root and the strategy doc is at `docs/git-strategy.md`, the link is `docs/git-strategy.md`). + +4. **If no matching section is found:** add a new top-level section. Place it before any trailing "License" / "Acknowledgements" section if present; otherwise append at the end of the file. Format: + ```markdown + + ## Git strategy + + See [<relative-path>](<relative-path>) for branch/worktree policy, merge authority, recovery steps, and multi-agent coordination rules. The doc is the authoritative reference — do not duplicate the rules here. + ``` + +5. **Do not** overwrite or rewrite existing content by default. Append only. + +6. **Drift check when a link already exists.** If the file already contains a link to `git-strategy.md` at the expected path: + - Locate the section containing that link. + - Count non-link prose in that section (bullet points, paragraphs — anything other than the link line itself). + - If the section is JUST the link line (no surrounding prose summary): skip this file — the reference already exists and there's nothing to drift. + - If the section has a non-trivial prose summary (rule of thumb: more than 3 lines or more than 2 bullets of non-link content): STOP and surface to the user. Show the existing summary content and note that the canonical `git-strategy.md` may have moved on since the summary was written. Ask whether the user wants to: + 1. Leave it (summary is still accurate) + 2. Refresh selected bullets (user points to specific stale content) + 3. Rewrite the whole summary from the current doc's §Invariants + §Merge authority + - Do NOT attempt to auto-diff the summary against the canonical doc — semantic drift is a judgment call, not a mechanical one. Surface and ask. + +### Step 6.5 — Offer to wire §Orchestration into `implementation-pitfalls.md` + +This step is the complement to §Multi-agent coordination → Output persistence in the git-strategy doc just written. The goal is to put a trigger-and-pointer to that rule in the project's `implementation-pitfalls.md` so plan writers hit it via their mandated-read path (e.g. `writing-plans-enhanced`). + +1. **If `implementation-pitfalls.md` is NOT present** (from Step 2 detection): skip this step. Note in the Step 7 report that the user can run `pitfalls-docs-init` next to install pitfalls docs with the §Orchestration trigger pre-populated. + +2. **If `implementation-pitfalls.md` is present AND already has a `## Orchestration` section** (from Step 2 grep): skip this step. The wiring is already done; do not duplicate. + +3. **If `implementation-pitfalls.md` is present AND does NOT have a `## Orchestration` section**: offer to append the following block. Show the user what you'll append and get confirmation before writing: + + ```markdown + --- + + ## Orchestration + + This section is the discovery hook for plan writers who arrive here via the `writing-plans-enhanced` (or equivalent) mandated-read path. The canonical rules live in `docs/git-strategy.md` → §Multi-agent coordination → Output persistence. This section does NOT restate those rules — it exists to make sure plan writers notice they apply. + + ### ORCH-1: Analysis Dispatches Must Persist Findings Before Returning + + **Trigger:** Your plan dispatches parallel subagents (bug hunts, audits, phased analysis, parallel investigations) whose findings would be expensive to regenerate if lost. + + **What you need to do:** Every such dispatched subagent MUST write its complete report to a persistent file BEFORE returning; the response message is not the sole record. + + **Read the full rule:** `docs/git-strategy.md` → §Multi-agent coordination → Output persistence. That section carries the copy-pasteable prompt block (with `<PERSISTENCE_PATH>` substitution), file-path conventions, orchestrator commit cadence, and the cases where the rule doesn't apply. + + **Why this is in implementation-pitfalls:** because the plan-writing skill mandates reading this file, and this rule has to be noticed at plan-write time (when the dispatch prompts are being drafted), not at execution time (when it's too late). The failure mode — orchestrator context compacting mid-consolidation and lossily dropping findings — is predictable and preventable if the plan author builds persistence into the dispatch prompts from the start. + + ### Review Checklist + + - [ ] **Dispatch prompts include the mandatory-persistence block** — copy from `docs/git-strategy.md` §Output persistence; substitute `<PERSISTENCE_PATH>` with a durable per-subagent path (ORCH-1) + - [ ] **Plan specifies exact persistence paths, not "write somewhere useful"** — ambiguous paths default to `/tmp` under pressure, which doesn't survive (ORCH-1) + - [ ] **Orchestrator commits subagent artifacts wave-by-wave** — committed files land on the campaign branch before consolidation begins (ORCH-1) + ``` + + Adjust the `docs/git-strategy.md` path to match wherever git-strategy.md was written in Step 4 (it may not be exactly `docs/git-strategy.md` if the user chose a different location). + +4. **Placement within the pitfalls doc:** append after the last domain/topic section but BEFORE `# Appendix A: Historical Changelog` (if present). If the pitfalls doc has no appendices, append at the end of the file. + +5. **Do not alter existing content** in `implementation-pitfalls.md` beyond adding the new section. If the file's structure is unclear (no clear end-of-domain-sections landmark), surface to the user rather than guess at placement. + +### Step 7 — Report + +Summarize what was done: + +``` +Done. + +Wrote: docs/git-strategy.md +.gitignore: added '.claude/worktrees/' +CLAUDE.md: appended reference under '## Development Workflow' section +AGENTS.md: not found — skipped +Pitfalls cross-ref: appended §Orchestration to docs/pitfalls/implementation-pitfalls.md + (OR: implementation-pitfalls.md not found — run pitfalls-docs-init + to install pitfalls docs with §Orchestration pre-populated) +``` + +Mention any follow-ups: + +- Commit the new file and updates (suggest a commit message, e.g. `docs: adopt worktree-based git strategy`). +- If forge is non-GitHub, remind the user to verify the CLI commands. +- If the template scope doesn't cover the project's needs (release branches, hotfix flow), remind the user they'll need separate policy for those. +- If `implementation-pitfalls.md` was missing: recommend running `pitfalls-docs-init` next. That skill installs `implementation-pitfalls.md` and `testing-pitfalls.md` from templates; the implementation-pitfalls template has the §Orchestration trigger pre-populated, so no manual wiring is needed afterward. + +## Common mistakes + +- **Deleting the Branching model section AFTER find-replace instead of before.** The section contains both `main` and `dev` as concrete branch names in the descriptive patterns. A naive `main → dev` replace on that section produces `integration branch is dev; dev is release-only` — broken. ALWAYS delete the pre-adoption sections FIRST, then do the branch-name substitution. +- **Writing over existing `git-strategy.md` without the pre-flight search.** There can be ghost copies at `git-strategy.md` and `docs/git-strategy.md` from different team members or past runs. Always search both tracked and untracked before writing. +- **Assuming the branching pattern.** If both `main` and `dev` exist, DO NOT guess. Ask the user which is the integration branch — two-branch gitflow looks different from a GitHub-flow repo that happens to have a stale `dev` branch. +- **Updating only one of CLAUDE.md / AGENTS.md when both exist.** Both should be updated if found. Different agent frameworks read different files; projects that have both need both wired up. +- **Using Claude-Code-specific tooling.** This skill is cross-platform. Do not invoke `TodoWrite`, `AskUserQuestion`, `Skill`, or any other Claude-Code-specific tool in your implementation. Use plain shell commands, file operations, and natural-language prompts to the user. +- **Forgetting the .gitignore update.** Without it, worktree contents will appear in `git status` and can be accidentally committed — the first failure mode the strategy doc is designed to prevent. +- **Creating `git-strategy.md` without the user's confirmation on output location.** When `docs/` doesn't exist, the default is not obvious. Always ask. +- **Matching template files in the pre-flight search.** `grep -i git-strategy` matches `git-strategy-template.md`, `git-strategy.draft.md`, etc. Filter by exact basename (`git-strategy.md`, case-insensitive) only. A template is not a deployed policy doc. +- **Silently skipping a CLAUDE.md / AGENTS.md that already links to `git-strategy.md`.** The link being present does not mean the surrounding summary is still accurate. If there's a prose summary of more than a few lines, surface it for the user to review — summaries drift as the canonical doc evolves. + +## Quick reference (condensed workflow) + +| Step | Action | +|---|---| +| 1 | Verify git repo; search for existing `git-strategy.md`; prompt if found | +| 2 | Auto-detect branch, forge, paths, CLAUDE.md/AGENTS.md presence | +| 3 | Present detected values; ask user to confirm/adjust | +| 4 | Read template; delete pre-adoption sections; substitute branch/path; forge swaps; write | +| 5 | Append worktree path to `.gitignore` if not already there | +| 6 | Append reference to CLAUDE.md and/or AGENTS.md; create section if needed | +| 6.5 | If `implementation-pitfalls.md` exists without §Orchestration, offer to append the trigger-and-pointer; otherwise note the gap in Step 7 report | +| 7 | Report paths changed and next steps (including whether to run `pitfalls-docs-init` next) | + +## Relationship to other skills + +- **`pitfalls-docs-init`**: separate, composable skill that installs `implementation-pitfalls.md` and `testing-pitfalls.md` from templates. The templates include the §Orchestration trigger-and-pointer back to this skill's `git-strategy.md`. Either skill can run first; this skill's Step 6.5 handles the case where `implementation-pitfalls.md` already exists (appends §Orchestration if missing, skips if present), and the Step 7 report flags the case where it doesn't exist yet (recommends running `pitfalls-docs-init` next). No direct skill invocation between them. +- **`superpowers:using-git-worktrees`**: the canonical skill for worktree creation mechanics (directory priority, gitignore verification, project setup, baseline tests). This doc's Day-one workflow forward-references it. If your agent framework has access to it, use it when creating worktrees per the output doc. +- **Plan-writing skills** (e.g. `superpowers:writing-plans`, `writing-plans-enhanced`): these typically mandate reading the pitfalls docs during plan authorship. After this skill runs (and `pitfalls-docs-init` has populated the pitfalls files), the §Orchestration trigger is discoverable on the plan-writing mandated-read path. +- **Future `project-init` wrapper**: runs `git-strategy-init` + `pitfalls-docs-init` (+ other init skills) in sequence for one-command project bootstrap. Each sub-skill is idempotent and composable; the wrapper just sequences them. + +## Cross-platform notes + +This skill is pure instruction — no bundled scripts. Any agent framework with shell access and read/write file operations can execute it. + +- **Git subcommands** used are portable (Windows, macOS, Linux, Git Bash). +- **File existence checks** should use your agent's native file-inspection tools rather than shell `test` — `test -f` doesn't work on Windows cmd. +- **File listing** — prefer `git ls-files` over `find` / `dir` for portability. +- **Grep / search** — prefer your agent's Grep tool over piping `git ls-files | grep`, since `grep` isn't on Windows cmd by default. +- **Path handling** — use forward slashes in all paths you write into files. Git handles them on Windows. + +The skill does not depend on any Claude Code-specific tool (`Skill`, `TodoWrite`, `AskUserQuestion`, etc.). Instructions are agent-agnostic. diff --git a/.claude/skills/git-strategy-init/references/git-strategy-template.md b/.claude/skills/git-strategy-init/references/git-strategy-template.md new file mode 100644 index 00000000..9405bf5a --- /dev/null +++ b/.claude/skills/git-strategy-init/references/git-strategy-template.md @@ -0,0 +1,572 @@ +# Git Strategy + +Policy for keeping a repository out of the branch-proliferation + checkout-roulette failure mode that eats coordination time. The failure is acute when multiple concurrent agents share one working tree, but the rules apply to any workflow where more than one unit of work is ever in flight (including solo developers juggling branches). + +## Branching model + +This strategy assumes **one long-lived integration branch** where work converges. All other work happens in isolated worktrees — each with its own ephemeral branch that exists only to carry commits to a PR. Branches are merge vehicles, not workspaces; the root checkout stays on the integration branch and never switches off it. This maps directly to: + +- **GitHub flow** — integration branch is `main`. (This doc's default.) +- **Trunk-based development** — integration branch is `main`; worktrees live hours rather than days; feature flags cover incomplete work. +- **Two-branch / simplified gitflow** — integration branch is `dev`; `main` is release-only, updated via periodic release PRs from `dev`. + +Out of scope: the release-cut mechanism itself (e.g. `release/*` branches, hotfix branches cut from `main`, environment branches like `staging` and `production`, and the PR-from-`dev`-to-`main` flow in two-branch gitflow). The invariants below still apply to all feature work converging on the integration branch — you'll need separate policy for whatever ships *off* that branch. + +## Adapting this doc to your project + +This doc assumes your integration branch is `main`. If it's something else (e.g. `dev`, `develop`, `trunk`): + +1. Identify your pattern in the **Branching model** section above. Once you've identified it, you can delete that section — it's descriptive, not operational, and it uses both `main` and `dev` as concrete branch names inside the patterns, which makes step 2 unsafe to apply to it. +2. Find-replace `main` → your branch name throughout **the rest of the doc** (everything below this section). The word `main` below this section *only* refers to the branch — the root working tree is called the "root checkout" to avoid collision. +3. Review the command blocks after replacement to confirm they still make sense. + +### Other baked-in assumptions + +- **GitHub + `gh` CLI.** Commands like `gh pr create` and `gh pr merge` assume GitHub. For GitLab, Bitbucket, or other forges, substitute the equivalent CLI (`glab`, `bb`, etc.) or web-UI step. +- **Bash-like shell.** `$(date +%Y%m%d)` and similar constructs assume bash/zsh (or Git Bash on Windows). PowerShell / cmd users will need to adapt. +- **Worktree path is gitignored.** This doc uses `.claude/worktrees/<name>` by convention (originating from Claude Code usage); any gitignored path inside the repo works. If you pick a different path, substitute it throughout. Whatever path you choose, add it to `.gitignore` before creating any worktrees — otherwise worktree files show up in `git status` and risk being committed. +- **Optional project-tracking doc.** One bullet in §Mechanics for auto-merge mentions updating a `program-status` doc. If your project has no such doc, ignore that bullet. + + +--- + +## Why this exists + +Typical failure pattern: multiple concurrent agents share the root checkout, create and check out feature branches inside it, commit to local `main`, and produce a three-way divergence that requires manual reconciliation. Branches accumulate — dozens of local branches, many live worktrees — and every fresh agent spends turns orienting to the git state rather than doing the work. + +The framing: when every agent working right now is getting confused, the strategy is not working. Time and tokens get spent unfucking git instead of shipping. The goal is to keep the repo in a state where a new agent can orient in seconds. + +This doc captures the policy so the failure doesn't recur. + +## Contents + +- [Invariants](#invariants) +- [Day-one workflow for any new work](#day-one-workflow-for-any-new-work) +- [What NOT to do](#what-not-to-do) +- [Recovery from a messy state](#recovery-from-a-messy-state) +- [Multi-agent coordination rules](#multi-agent-coordination-rules) — git isolation + output persistence +- [Campaign branches](#campaign-branches) — long-cycle work (audits, multi-phase refactors) +- [Living documents on campaign branches](#living-documents-on-campaign-branches) +- [Merge authority](#merge-authority) — review triggers, auto-merge, classification, CI failures, merge conflicts +- [Abandoning a branch](#abandoning-a-branch) — PR closed without merging +- [Red flags (stop and diagnose)](#red-flags-stop-and-diagnose) +- [Rationale (failure-mode table)](#rationale-failure-mode-table) +- [Exceptions](#exceptions) + +## Invariants + +1. **The root checkout is always on `main`.** `git branch --show-current` in the root checkout always prints `main`. No `git checkout <branch>` in the root checkout, ever. +2. **Local `main` mirrors `origin/main`.** Any divergence is transient — at most one operation away from being pushed or reset. +3. **Work happens in dedicated worktrees.** `git worktree add .claude/worktrees/<name> -b <branch>` creates both the worktree and the branch atomically. The worktree is the workspace (for whoever — agent or human — is doing the work); the branch is the merge vehicle. +4. **Branches are ephemeral.** Branch → work → PR → merge → delete branch + worktree **in the same session that performed the merge, before starting the next task**. That's the concrete bar — not "promptly" in the hand-wavy sense, but *this session, now, before I move on*. For day-sized work the branch's whole lifecycle fits in one session. For long campaigns (audits, multi-phase refactors, research with a Living Document), the branch lives for the duration of the campaign and is deleted in the session that merges its final PR. See §Campaign branches for the long-cycle pattern. No branch — regardless of prefix (`feat/*`, `fix/*`, `chore/*`, `audit/*`, etc.) — persists past its PR merge. +5. **Push after every merge.** Local `main` never sits ahead of `origin/main` for more than the single operation between merge and push. +6. **Only one session writes to local `main` at a time.** Concurrent merges by different sessions into local `main` cause the three-way divergence described in §Why this exists. + + **Concrete test:** if you are running any of `gh pr merge`, `git push origin main`, or `git reset --hard origin/main` against local `main` *right now*, you are the writer for that operation. No other session may run any of those at the same time — full stop, no exceptions, no "probably fine if it's fast." If you don't know whether another session is about to write, wait and ask. + + The practical consequence: worker sessions that push their branch and open a PR don't merge; the session that does the merge is the writer for that turn. Call that session the "orchestrator" if you like — the role name is shorthand, the mutual-exclusion test above is the load-bearing rule. In a single-session setup where one session authors + dispatches analysis subagents + merges, that session is the only writer by construction and there's no race to manage. + +## Day-one workflow for any new work + +**Worktree naming convention (this project).** Branch names can use `/` for grouping (`feat/foo`, `fix/bar`, `audit/security-review-2026-04-22`). Worktree paths replace `/` with `-` and live directly under `.claude/worktrees/` — a flat directory tree, not nested. Example: +- branch: `audit/security-review-2026-04-22` +- worktree: `.claude/worktrees/audit-security-review-2026-04-22` + +This is a project convention, not a universal rule. The alternative (nested dirs mirroring branch prefixes) works too — the cost is the flattening loses round-trip identification (you can't recover the exact branch name from the worktree path). We accept that because branch names in practice are unique enough and cleanup is simpler. Pick one and stay consistent. + +**For worktree creation mechanics** (directory priority, gitignore verification, project setup, baseline tests), see the `superpowers:using-git-worktrees` skill. This doc covers the lifecycle of a worktree; that skill covers its creation. + +```bash +# 1. Ensure the root checkout is on main and fresh +cd <repo-root> +git branch --show-current # must print 'main' +git fetch origin main +git log --oneline origin/main..main # should be empty +# If non-empty and you want to keep those commits: push first. +# If non-empty and you don't: this is destructive realignment — see +# §What NOT to do. Surface the commits to the user and get explicit +# approval before running: git reset --hard origin/main +git log --oneline main..origin/main # should be empty; if not, fetch+reset + +# 2. Create isolated worktree + branch (ONE command creates both). +# See the naming-convention paragraph and worktree-creation skill +# reference directly above this bash block. +git worktree add .claude/worktrees/<name> -b <branch-name> + +# 3. Do all work inside the worktree +cd .claude/worktrees/<name> +# ... edit, test, commit with EXPLICIT paths (no 'git add -A', 'git add .', 'git commit -a') ... + +# 4. Push the branch and open a PR +git push -u origin <branch-name> +gh pr create --fill # or full body per project conventions + +# 4a. If the PR develops conflicts with main: +# cd .claude/worktrees/<name> +# git fetch origin main +# git rebase origin/main +# # ... resolve conflicts, git add <paths>, git rebase --continue ... +# git push --force-with-lease # NEVER plain --force +# See §Handling merge conflicts for substantive conflicts, recovery, escalation. +# +# 4b. If CI fails: investigate and fix. Lint / build / test errors are the +# agent's responsibility, not a classification escalation. Up to 3 attempts +# on the same failure before escalating. See §Handling CI failures. + +# 5. When the PR merges, reclaim everything +cd <repo-root> +git fetch origin main +# This reset is always safe-sync mode: local main never gained commits +# (invariant 2), so we're only advancing the ref to include the merge commit. +git reset --hard origin/main # bring local main to the post-merge tip +git worktree remove .claude/worktrees/<name> +git branch -D <branch-name> +``` + +If the PR is closed WITHOUT merging (scope rejected, approach abandoned, duplicate), see §Abandoning a branch for cleanup. + +## What NOT to do + +- **No `git checkout <branch>` in the root checkout.** Every time this happens, a concurrent agent in the same checkout gets the wrong branch state. Use a worktree. +- **No commits directly to local `main`.** Even for docs. Create a worktree + branch + PR. The single exception is an emergency `git reset --hard origin/main` realignment, which has two modes: + - **Safe sync** — local `main` has no divergent commits, so the reset just advances the ref to match `origin/main` with nothing to lose. No approval needed. + - **Destructive realignment** — local `main` has divergent commits that you've decided are not worth keeping. The reset drops them permanently. In this mode you MUST stop, surface the divergent commits to the user (`git log --oneline origin/main..main`), and receive explicit user approval before running the reset. +- **No `git pull` on `main`** — via terminal or VS Code Sync. A diverged local main + remote main produces a merge-of-main-into-main commit. Use `git fetch origin main && git reset --hard origin/main` to realign. +- **No branches living past their PR merge.** Merged-branch-still-exists is where the zoo starts. Delete on merge. +- **No `git add -A`, `git add .`, or `git commit -a`.** All three stage more than you mean to. Explicit paths only. Keeps stale test fixtures, secrets, and cross-agent residue out of commits. +- **No skipping hooks** (`--no-verify`, `--no-gpg-sign`) unless the user has explicitly authorized skipping for this specific operation. If a hook fails, fix the underlying issue — don't bypass it because "the user seemed okay with it last time." + +## Recovery from a messy state + +When the repo already has a zoo of branches — or when you inherit it: + +### Step 1 — Quiesce in-flight work + +Don't start cleanup while agents are mid-merge or mid-commit. Wait for them to finish, then audit. Destructive cleanup during in-flight work destroys work. + +### Step 2 — Push anything local-only that should survive + +```bash +git fetch origin main + +# Any commits on local main not on origin/main? +git log --oneline origin/main..main + +# If yes and wanted: push them +git push origin main + +# If yes and NOT wanted: this is destructive realignment (see §What NOT to do). +# Surface the commits to the user and get explicit approval before: +git reset --hard origin/main +``` + +### Step 3 — Identify reclaimable branches + +```bash +git branch --merged main +``` + +Every branch listed (except `main` itself) is already fully absorbed into `main`. Safe to delete. + +```bash +# Delete each reclaimable branch. -d refuses if not merged (safety). +git branch -d <branch-name> +``` + +### Step 4 — Triage the remainder + +```bash +git branch --no-merged main +``` + +For each: decide keep (active work, genuine experiment worth preserving) or delete. Stale WIP branches almost always get deleted. Experiments with published results usually can be deleted too — the results are already committed on `main`. + +Before deleting an unmerged branch, save a reflog pointer if there's any chance you want the work back: + +```bash +# Save a pointer first (optional but cheap insurance) +git branch rescue/<name>-$(date +%Y%m%d) <branch-name> + +# Capital -D force-deletes even unmerged branches. This is destructive — +# lowercase -d would refuse. Only use -D after the rescue pointer above +# or after confirming the branch is truly disposable. +git branch -D <branch-name> +``` + +### Step 5 — Prune worktrees + +```bash +git worktree list +git worktree prune # removes worktree records for deleted dirs +git worktree remove <path> # removes a live worktree's files cleanly +``` + +### Step 6 — Verify clean state + +```bash +git branch # short list, mostly just main +git worktree list # only live worktrees +git log --oneline origin/main..main # empty +git log --oneline main..origin/main # empty +git status --short # empty, or only files you can explicitly account for (e.g. local scratch dirs you know are yours) +git branch --show-current # 'main' +``` + +## Multi-agent coordination rules + +Multi-agent safety has two orthogonal dimensions — **git isolation** (preventing commit interleaving) and **output persistence** (preventing findings from being lost when orchestrator context compacts). Rules for each: + +### Git isolation — writes only + +- **Every session that WRITES to the tree (commits, pushes) needs its own worktree.** Reads are different — see below. Two concurrent writers in the same worktree produce interleaved edits that cost hours to reconcile. +- **Dispatched writer sessions MUST create a worktree, not reuse the parent checkout.** If your agent framework has an isolation setting (e.g. Claude Code's Agent tool takes `isolation: "worktree"`), enable it. If the framework has no such setting, the dispatch prompt itself must instruct the agent to `git worktree add .claude/worktrees/<name> -b <branch-name>` before doing any work. Without this, the dispatched writer will check out a branch in whatever checkout it was launched from — often the root checkout. +- **Analysis dispatches (read-only, return findings, no commits) do NOT need their own worktree.** They can read from any checkout safely because reads don't conflict. One caveat: an analysis dispatch sees the state of whatever ref it was launched against. To audit an in-flight branch's state, launch the dispatch from that branch's worktree. To audit `origin/main`, launch from the root checkout. Being clear about which ref you're auditing prevents the "I audited the wrong thing" failure mode. +- **Fetch before comparing.** When scripts or agents compare against `main`, always use `origin/main` after `git fetch origin main`. Never the local `main` ref — it can be stale by minutes when another agent just merged. + +### Output persistence — analysis dispatches MUST write findings before returning + +**The rule:** every dispatched analysis subagent that produces non-trivial output (reports, findings, audits, deep-analysis summaries) MUST write its complete output to a persistent file in the repo BEFORE returning to the orchestrator. The response message exists for consolidation and can be summarized; the file is the canonical record. + +**Copy-pasteable dispatch prompt block** (prepend to every dispatch that this rule applies to, substituting `<PERSISTENCE_PATH>` with the specific file path for that subagent): + +``` +MANDATORY PERSISTENCE. Before returning findings in your response, you MUST +write your complete report to <PERSISTENCE_PATH>. <PERSISTENCE_PATH> is an +ABSOLUTE path — do not interpret it as relative, do not strip any prefix, +do not re-anchor it to your current working directory. Your CWD may not +match the orchestrator's (common case: orchestrator dispatched from a +worktree, you inherited the root checkout's CWD), so only the absolute +path reliably lands the artifact where the orchestrator expects it. The +file is the persistent record; the response message exists for orchestrator +consolidation but must not be the sole record. If you cannot write the +file (tool failure, disk error), STOP and report the failure — do not +proceed with a response-only report. This rule exists because orchestrator +context compacts during long consolidations and lossily reconstructs +in-memory reports — findings get silently dropped when they live only in +response messages. +``` + +**Substitute `<PERSISTENCE_PATH>` with:** an ABSOLUTE path (not repo-relative). Derive it in the orchestrator's context before crafting the dispatch prompt — the orchestrator knows its worktree root, the subagent may not. Typical derivation: + +```bash +# Orchestrator computes absolute path before dispatch: +WORKTREE_ROOT=$(git rev-parse --show-toplevel) +PERSISTENCE_PATH="${WORKTREE_ROOT}/dev/bug-hunts/YYYY-MM-DD-<topic>-<variant>.md" +# Then substitute this absolute value into the dispatch prompt. +``` + +Shapes to use: `<worktree-root>/dev/bug-hunts/YYYY-MM-DD-<topic>-<variant>.md`, `<worktree-root>/docs/audits/<topic>/<subagent-name>.md`, or similar. The relative forms (`dev/bug-hunts/...`) are what the PATH-under-worktree looks like — but pass the absolute form to the subagent. Known failure mode: a hunter received the relative form, wrote to the root checkout's `dev/bug-hunts/` instead of the worktree's, orchestrator had to recover. `/tmp` is NOT durable across sessions — never use it. + +**Why this rule:** the failure mode it prevents is that an orchestrator dispatches several parallel analysis subagents, each returns a large report in its response message, the orchestrator tries to consolidate them while its context approaches compaction, compaction lossily summarizes the reports, and findings silently disappear. The fix is to make the reports durable before the orchestrator has to hold them in memory. + +**Orchestrator commits the artifacts wave-by-wave.** Immediately after a parallel dispatch wave returns, commit the persistent files to the campaign branch (see §Campaign branches for why intermediate commits are expected). One commit per wave, e.g. `docs(audit): capture Phase 2 CLI bug-hunt artifacts (3 hunters)`. A mid-consolidation interruption can resume from committed artifacts without reconstructing from orchestrator memory. A resuming session reads its state from: (a) the latest phase-boundary commits on the campaign branch, and (b) the Living Document's current state on the branch (see §Living documents on campaign branches). + +**When the rule doesn't apply:** trivial dispatches where the response itself is the entire output (one-line questions, yes/no checks, single-value lookups). If the response could fit in a tweet and losing it wouldn't be expensive to regenerate, no persistent file is needed. + +**Cross-cutting discovery hook:** `docs/pitfalls/implementation-pitfalls.md` §Orchestration carries a trigger-and-pointer back to this section for plan authors. Pitfalls is mandated reading during plan-writing (via `writing-plans-enhanced`), so plan authors hit the trigger via their normal workflow and land here for the full rule. + +## Campaign branches + +**When the pattern applies:** work that spans multiple sessions over days or weeks — audits, multi-phase refactors, security reviews with a Living Document plan, research deliverables with staged phases. Campaigns don't fit the day-sized assumption of Invariant 4's "promptly after merge" rule. + +**What's different from short-cycle work:** + +- **Branch lifetime is the campaign's lifetime.** The branch exists until the final PR merges. That may be days or weeks. The invariant — *no branches past PR merge* — still holds; the PR just takes longer to be ready. +- **Intermediate commits on the campaign branch are expected, not an anti-pattern.** A campaign accumulates load-bearing artifacts at phase boundaries (e.g. the bug-hunt findings committed in §Output persistence above). Commit each phase's deliverables as they land — a session crashing in phase 5 resumes from the phase-4-committed state, not from orchestrator memory. Intermediate-state commits are cheap; reconstructing from memory is expensive. +- **Rebase onto `origin/main` at phase boundaries, not ad hoc.** During a 2-week campaign, `origin/main` will gain many merges from other work. Rebase the campaign branch onto `origin/main` at each natural phase boundary to keep the campaign's conflict surface small at final-merge time and surface any incompatibility early while campaign context is still fresh. Mechanics: see §Handling merge conflicts. + + **Concrete triggers for a rebase** (any one is enough; whichever fires first): + 1. A numbered phase just completed and its artifacts are committed on the branch. + 2. `git log --oneline origin/main..main` on local `main` is empty and `git log --oneline <campaign-branch>..origin/main` shows 10+ commits of drift — `main` has moved far enough that waiting will hurt more than rebasing now. + 3. You're about to start a new plan section that touches files likely-modified by other in-flight work. + 4. A week has passed since the last rebase of this campaign branch. + + Don't rebase on *every* `origin/main` advance — that's the churn we're avoiding. Don't wait until final merge to discover conflicts either — that's what we're protecting against. +- **If main keeps advancing faster than the campaign progresses** such that you're rebasing every session, the PR's scope is likely too broad — surface to the user to decide whether to split the campaign into two narrower branches. + +**Single-writer assumption.** This policy assumes **one session at a time writes to the campaign branch**. Multiple sessions can dispatch analysis subagents against the campaign branch in parallel (see §Multi-agent coordination → Git isolation for the reads-vs-writes split), but only one session commits at a time. Git's default behavior enforces this — `git worktree add <path> <existing-branch>` fails with `fatal: '<branch>' is already checked out` when another worktree has it. (Technically `--force` overrides that check, which is why it's the default-behavior safety net, not an ironclad guarantee.) So the failure mode in practice isn't concurrent-writes-on-one-branch — git's default behavior blocks that — it's someone hitting the `already checked out` error, giving up, and committing to `main` or creating a parallel branch off `main`. If you hit that error, STOP and surface to the user; don't improvise around it with `--force` or a parallel branch. + +**Stacked PRs are the escape hatch and are out of scope for this version.** If a campaign genuinely requires parallel writers, the pattern is: each writer has a sub-branch off the campaign branch, sub-branches merge into the campaign branch via PR, campaign branch merges into main via final PR. This works, but the mechanics (rebase ordering, in-flight sub-branches, final-merge bookkeeping) aren't documented here. If you hit this, surface to the user — don't retrofit stacked-PRs without a documented pattern. + +**Session-to-session hand-off:** when a campaign spans sessions, the outgoing session commits any in-progress work (even WIP commits, as long as CI would still be green or the commit is marked `wip:` and not the merge head) and updates any Living Document to reflect current state (see §Living documents on campaign branches). The incoming session reads the branch's latest state from committed artifacts — not from the outgoing session's chat history, which it doesn't have. + +## Living documents on campaign branches + +**What's a Living Document:** a plan file (or equivalent) that the executing session updates as work progresses — marking phases complete, recording discoveries, appending Deviations from the original plan, etc. Authoritative in-flight state lives in this file. + +**The producer side — where the authoritative state lives:** during a campaign, the authoritative version of the plan file is the one on the campaign branch. The campaign session reads from it and writes to it every session. Updates committed to the branch are the permanent record. + +**The consumer side — where downstream readers should look:** readers of `main` (other agents, other sessions, humans consulting the project's docs directory) see the version of the plan file as of the last merge, which may be days or weeks behind the branch's current state. This is a feature, not a bug — `main` represents merged, reviewed state; in-flight campaigns are explicitly not merged yet. + +If a downstream reader needs the current state of a plan file that's under active campaign execution, they have three paths: + +```bash +# Option 1: check out the campaign branch in a short-lived read-only worktree +git worktree add -f .claude/worktrees/read-audit audit/security-review-2026-04-22 +cd .claude/worktrees/read-audit +cat docs/plans/audit-plan.md +# ...read, then clean up: +cd <repo-root> +git worktree remove .claude/worktrees/read-audit + +# Option 2: read the file directly from the branch without a worktree +git show audit/security-review-2026-04-22:docs/plans/audit-plan.md + +# Option 3: if there's an open PR, read the PR's version via gh. Two paths: +# a) The diff of the file as the PR changes it (good for seeing what's changed): +gh pr diff <pr-number> -- docs/plans/audit-plan.md +# b) The full file content at the PR's head ref (good for reading the whole thing): +gh api "repos/{owner}/{repo}/contents/docs/plans/audit-plan.md?ref=<pr-head-branch>" \ + --jq '.content' | base64 -d +# Note: `gh pr view --json files` returns metadata (paths + diff stats), NOT file +# content — don't use it for reading. Option (a) or (b) is what you want. +``` + +**Check for in-flight campaigns before relying on main's copy:** `gh pr list --state open --search 'plan <name>'` or similar. If an open PR touches the plan file, consult the branch version; if not, main's copy is the authoritative state. + +## Merge authority + +Default mode is **auto-merge by the agent**. The user ordered the work, the agent executed it, CI validated it — if none of the Review triggers below apply, the agent merges on green CI. The core goal of this doc is velocity: stop agents from tripping over each other in git, and have them automatically handle anything that doesn't genuinely require the user's judgment. Click-to-approve with no actual review is theatrical trust, not real trust; this policy aims for genuine trust. + +### Review triggers — user merges + +A PR is `Review` if ANY of these apply: + +**Domain triggers** (the code itself is in a sensitive area): + +- Authentication / authorization, secrets handling, session management, cryptography, SSRF / injection guards, or other security-sensitive code. +- Data-integrity paths — anything that could corrupt persisted state if wrong. +- Architecture changes — project structure, public interfaces, serialization / wire contracts, database schema, external API contracts that callers depend on. + +**Discovery triggers** (the agent's work surfaced something needing judgment): + +- `Escalate` classification — the agent hit something requiring the user's judgment. Concrete cases: + - CI investigation revealed a bigger design issue (see §Handling CI failures). + - A merge conflict is substantive — not mechanical — and requires deciding which behavior is correct (see §Handling merge conflicts). + - Scope drift — what was built deviates materially from what was ordered. + - Any other surprise, ambiguity, or design-level concern encountered during implementation. + +If none of the above apply → `Routine`, auto-merge on green CI. When genuinely unsure whether a trigger applies, classify up. But don't reflexively choose Review as hedging — the policy assumes routine merges are routine. + +### Auto-merge (the default) + +Requirements for a Routine PR to auto-merge: + +- Green CI. Skipped checks must be verifiably not-applicable to the changed files (e.g. a frontend check skipped because only backend files changed). Unexplained skips count as failures — investigate per §Handling CI failures; don't classify up as an escape hatch. +- PR title + body accurately describe what was done; scope matches the original ask. +- No dependency on a still-open `Review`-class PR. "Dependency" means: the PR imports, calls, or otherwise depends on code or types introduced by the open PR; the PRs modify overlapping files in ways that would conflict; or the PRs were authored to ship together as one logical change. + +Common Routine cases (informational — the Review triggers above are the real definition, not this list): + +- Docs updates, test additions, mechanical refactors (renames, formatter output, import reorg). +- Bug fixes in non-sensitive code with green CI. TDD discipline per project conventions (regression test for every fix) is separate from merge authority — follow it because it's good practice, not because it gates the merge. +- Feature implementations from a plan that was adversarially reviewed upstream. +- Dependency version bumps. + +### Opening-agent classification + +Every PR body must include a `## Merge classification` heading with ONE of: + +- `Routine — auto-merge on green CI` +- `Review — <specific trigger>` — e.g. `Review — auth code`, `Review — public API contract change`, `Review — schema migration`. The trigger should reference a Domain trigger from above. +- `Escalate — <specific concern>` — the agent encountered a Discovery trigger. State the concern concretely: what's ambiguous, what surfaced, what judgment is needed. + +Missing classification defaults to `Review`. + +**Classification pitfalls worth noting:** + +- **Hedging to `Review` when `Routine` applies.** The rule says "when genuinely unsure, classify up — but don't reflexively choose Review as hedging." The failure mode: classifying Review because the topic *feels* important, rather than because a specific Domain or Discovery trigger applies. Observed in practice: a docs-only PR editing this very policy doc got opened as Review with justification "policy is important, design-level change." Neither clause matched a Domain trigger (not security-sensitive, not data-integrity, not architecture-as-code-structure) nor a Discovery trigger (no CI investigation, no conflict, no scope drift, no surprise). The correct classification was Routine. The test to apply before invoking Review: *which specific trigger from the Domain or Discovery lists above applies to this PR?* If you can't name one, it's Routine — ship it. + +### Self-merge for Routine, user-merge for Review + +The opening agent merges their own Routine PR once conditions are satisfied. The agent who did the work has the most context to verify their own PR description, confirm CI went green, and check there's no open Review-class dependency. A separate session would need to rebuild that context from scratch without adding meaningful independence. + +For Review-class PRs, the opening agent MUST NOT merge — that's the user's role. Review happens because the user's judgment adds value, not as a rubber stamp. + +### Mechanics for auto-merge + +**Wait for CI with a dedicated monitoring primitive — not a bash sleep-and-poll loop.** Use your agent framework's event-stream / Monitor tool, `gh pr checks --watch`, or your CI system's webhook / push notification. Event-based waits are cheaper on context tokens and more reliable than polling — a tight `sleep N; check` loop burns context every iteration and still misses fast transitions. + +```bash +# ALWAYS --merge. NEVER --squash. NEVER --rebase. +# Full history preserved on main; squash destroys the per-commit trail +# agents and users both rely on for bisecting. +gh pr merge <number> --merge --delete-branch + +# Then in the root checkout: +cd <repo-root> +git fetch origin main +git reset --hard origin/main # realign local main + +# And clean the worktree: +git worktree remove .claude/worktrees/<name> +git branch -D <branch-name> # if --delete-branch didn't reach local +``` + +If your project has a program-status / project-tracking doc, update it when the merge materially changes a track's state (new phase completed, experiment dispatched, etc.). Don't bother for docs-polish merges that don't move the program needle. Skip this step if no such doc exists in your project. + +### Handling CI failures + +When CI fails on a `Routine` PR, the opening agent investigates and fixes — do NOT surface to the user as a classification escalation unless the investigation genuinely surfaces something needing user judgment. "There's a CI error, please investigate" is exactly what the user would tell the agent anyway; skip the ping and just do it. Fixing CI errors is part of finishing the work, not a separate approval gate. + +**Investigation procedure:** + +1. **Identify the failure type** from the CI log: + - Lint / format error → mechanical; fix and push. + - Build error (type error, missing import, compile error) → usually mechanical; fix and push. + - Test failure where your change should have kept the test passing → investigate root cause per the systematic-debugging discipline. Did your change break it, or was the test wrong to begin with? + - Test failure in an unrelated / flaky area → retry once. If it fails again, it's not a flake — investigate. + - Infrastructure failure (runner down, timeout, network) → retry once; if persistent, surface. +2. **Fix to root cause, not symptom.** If the obvious fix is a workaround that masks a deeper problem (per the standing "never fix symptoms" rule), don't land it — surface instead. +3. **Push the fix as a new commit on the branch.** Do not force-push over history unless the fix is a rebase onto updated `main` (see §Handling merge conflicts). +4. **Wait for CI again** using the monitoring primitive from §Mechanics. +5. **Iterate — up to 3 attempts on the SAME failure.** Fixing one error can legitimately surface another (lint → build error → test failure is a normal sequence when a change ripples); each sequential distinct error is fair game and doesn't count against the limit. But if the SAME failure recurs after 3 fix attempts, escalate — your diagnosis is wrong and looping wastes context. + +**When to escalate** (classify `Escalate`, not `Review`): + +- The investigation reveals an architectural or design-level issue that needs user judgment (e.g. "the test asserts behavior our new design invalidates — need to decide which is correct"). +- You can't find the root cause after 3 attempts at the same failure. +- The "fix" would be a workaround masking a deeper issue. +- CI continues failing in ways your fixes don't address — your mental model of the failure is wrong. + +**Do NOT escalate for:** + +- Routine lint / format / build fixes — fix them. +- Flaky tests that recover after retry — note in the PR body, move on. +- Infrastructure blips — retry, then move on if stable. +- A sequence of distinct errors where each fix surfaces a new one — that's normal; work through them in order. + +The escalation bar is: "does this CI failure surface something the user genuinely needs to know about, OR am I pinging because pinging is easier than investigating?" If the latter, investigate. + +### Handling merge conflicts + +When the PR develops conflicts with `main` (another PR landed first, touched overlapping files): + +**Resolve in the worktree, not the GitHub UI.** The UI resolver is fine for trivial single-line conflicts but produces a merge commit rather than a clean rebase, can't run tests or verify build, and loses the agent's context about what each change was trying to accomplish. + +**Mechanical resolution:** + +```bash +cd .claude/worktrees/<name> +git fetch origin main +git rebase origin/main + +# For each conflicting file: +# 1. Read both sides carefully. +# 2. Understand what each side was trying to accomplish. +# 3. Produce the correct combined result (usually not just pick-one-side). +# 4. git add <path> +# Then: +git rebase --continue # repeat until the rebase completes + +# Once rebase is clean and tests pass locally: +git push --force-with-lease # NEVER plain --force. See note below. +``` + +**Why rebase, not merge-main-into-branch:** Rebasing keeps the PR's commits linear on top of `main`. Merging `main` into the branch produces tangled history that's harder to bisect and makes it unclear which commits are "yours" vs. "upstream." + +**Why `--force-with-lease`, not `--force`:** `--force-with-lease` refuses to overwrite remote changes you didn't see locally. If another agent pushed to the same branch between your fetch and push (rare but possible in multi-agent setups), `--force` silently clobbers their commit; `--force-with-lease` rejects the push and forces you to reconcile. The rule: never downgrade to `--force` just because `--force-with-lease` rejected something — the rejection is the point. + +**If the rebase goes wrong:** + +```bash +git rebase --abort # Back to pre-rebase state. +# Or, if --abort doesn't recover cleanly: +git reset --hard <pre-rebase-sha> # Find <pre-rebase-sha> in git reflog. +``` + +**When to escalate** (classify `Escalate`): + +- The conflict is **substantive** — the two changes represent incompatible design decisions, and resolving requires a judgment about which behavior is correct. Don't silently pick one; surface the tradeoff. +- The rebase produces a state you can't cleanly recover from (repeatedly gets tangled, can't abort, reflog doesn't save you). +- You find yourself rebasing repeatedly because `main` keeps advancing — possible sign the PR's scope is too broad; surface for the user to decide whether to split it. (For campaign branches, rebase cadence is scheduled at phase boundaries — see §Campaign branches.) +- The conflict involves code that falls under a Domain review trigger (auth, data-integrity, architecture) — reclassify the whole PR as `Review` regardless of whether the mechanical resolution is easy. + +**Multi-agent race — only one wins at merge time:** + +Two PRs can't both cleanly merge if they touched overlapping files — the second PR through merge hits conflicts. To reduce wasted cycles: + +- Before starting work that might conflict with an in-flight PR, check: `gh pr list --state open`. +- If you're about to touch the same files as an open Review PR, consider waiting for it to merge first (then rebase your branch) rather than racing. +- When the race is unavoidable (parallel work on related files), the losing PR's agent handles the rebase — that's their cost for being second, not the leading PR's problem. + +## Abandoning a branch + +When a PR is closed WITHOUT merging — scope was rejected, approach abandoned, duplicate of another PR that landed first — clean up the same way you would after a merge, minus the reset (local `main` hasn't moved). + +**Default path: stash first, then remove.** This is cheap insurance against two failure modes: (1) tracked-but-uncommitted work in the worktree, and (2) "I thought I committed this but didn't" — the one that eats real work. Stashing makes the uncommitted state recoverable from `git stash list` for weeks; `--force` makes it gone forever. Err on the side of stashing. + +```bash +cd <repo-root> + +# 1. Inspect uncommitted state before touching anything: +git -C .claude/worktrees/<name> status --short + +# 2. Stash whatever is there (no-op if clean — safe to run unconditionally): +git -C .claude/worktrees/<name> stash push -u \ + -m "rescue-from-<branch-name>-$(date +%Y%m%d)" +# -u includes untracked files. The rescue label makes it findable later. +# If the stash fails because the tree is truly clean, that's fine. + +# 3. Now remove the worktree, delete the branch: +git worktree remove .claude/worktrees/<name> +git branch -D <branch-name> # -D since unmerged +git push origin --delete <branch-name> # optional: remove remote ref +``` + +**If `git worktree remove` still refuses** (e.g. filesystem lock, untracked-file mode issues), *do not* reflexively escalate to `--force`. Re-check with `git -C .claude/worktrees/<name> status --short` and investigate the specific blocker. `--force` is a last resort after confirming the stash captured what you care about — by that point the stash is your safety net, not the status check. + +**Recovering stashed work later:** + +```bash +git stash list | grep rescue-from-<branch-name> +git stash apply <stash-ref> # applies without removing from stash +git stash pop <stash-ref> # applies and removes +``` + +**Stashes survive worktree removal and branch deletion.** Stashes live in the main repo's `refs/stash` ref, not in the worktree's directory or on the deleted branch. `git worktree remove` and `git branch -D` have no effect on `refs/stash`. You can stash from inside the worktree, remove the worktree, delete the branch, and the stash is still listed in `git stash list` in the main checkout — from any branch. No need to worry about losing the stash by running the cleanup steps above. + +## Red flags (stop and diagnose) + +- `git status` at session start shows unexpected untracked files → another agent left in-flight work here. Investigate before touching. +- `git branch --show-current` returns anything other than `main` in the root checkout → checkout roulette occurred. Figure out who did it before switching back. +- `git log --oneline origin/main..main` non-empty → local main is ahead and unpushed. Push it, or figure out why. +- `git log --oneline main..origin/main` non-empty → local main is behind. `git fetch && git reset --hard origin/main`. +- Local branch count materially higher than your in-flight-work count (e.g. 5+ branches but only 1-2 active worktrees) → zoo is regrowing; run the Recovery steps. +- Your worktree directory (by default `.claude/worktrees/`) contains more subdirectories than `git worktree list` shows → abandoned worktree state; `git worktree prune`. +- An analysis dispatch returned a large report ONLY in its response, with no persistent file written → violation of §Multi-agent coordination output-persistence rule. Re-dispatch with an explicit persistence requirement in the prompt, or recover the report from the response and write it yourself before proceeding to consolidation. +- An analysis dispatch's persistence artifact landed in the root checkout instead of the worktree (or any other wrong location) → the dispatch received a relative `<PERSISTENCE_PATH>` and the subagent's CWD didn't match the orchestrator's. Move the file to the correct worktree location, commit there, and re-craft future dispatch prompts with absolute paths derived from `git rev-parse --show-toplevel` in the orchestrator's context. + +## Rationale (failure-mode table) + +Each rule addresses a specific observed failure: + +| Rule | Failure prevented | +|---|---| +| Root checkout stays on `main` | Checkout roulette: two agents in same checkout, one switches branches, the other commits to the wrong branch | +| Work in isolated worktrees | Concurrent edits to shared checkout producing interleaved commit histories | +| Branches ephemeral | Branch zoo — dozens of branches, agents confused about which is current, fresh agents burn turns orienting | +| Push after every merge | Local `main` diverging from `origin/main` during wave-boundary merges; three-way divergence requiring manual reconciliation | +| One writer to local main at a time | Concurrent merges by different sessions into local main produce unreconciled state at wave boundaries | +| No `git checkout` in root checkout | Handoff commits left dangling-unreachable after resets, nearly lost to gc | +| No `git add -A` / `.` / `-a` | Secrets, unrelated fixtures, and cross-agent residue accidentally committed | +| Analysis dispatches persist findings before returning | Orchestrator context compacts mid-consolidation, lossily reconstructs reports from memory, findings silently dropped | +| Persistence paths are absolute, not relative | Subagent CWD may not match orchestrator's (e.g. root checkout vs worktree); relative paths produce artifacts in the wrong location, often undetected until consolidation realizes files are missing from expected path | +| Campaign branches rebase at phase boundaries | Conflict-surface at final-merge time too large; incompatibility surfaces late when original context is gone | +| Abandon-branch cleanup stashes uncommitted state | Work silently lost via `--force` on worktrees with uncommitted or forgotten changes — especially "I thought I committed this" cases | + +### Observed incidents + +Concrete examples that motivated the rules above. Included as social proof so future agents considering a shortcut can see the specific failure mode the rule prevents. + +**Reset on main wipes uncommitted edits (worktree discipline).** An agent edited files (docs updates plus new skill authoring) directly on the root checkout's primary branch instead of creating a worktree. Mid-session, a separate agent's PR merged upstream and something ran `git fetch origin <primary-branch> && git reset --hard origin/<primary-branch>` against local `<primary-branch>` to realign. `git reset --hard` wiped the working tree of tracked-file modifications — the first agent's edits disappeared. Untracked files (newly-created files not yet `git add`-ed) survived. + +- **Recovery.** The agent replayed the edits from conversation context into a freshly-created worktree branched off current `origin/<primary-branch>`. Cost: roughly 15–20 minutes of replay plus one close call — had conversation context compacted before replay, the edits would have been unrecoverable. +- **Root cause.** Writing tracked changes on the root checkout's primary branch violated Invariant 1 ("Root checkout stays on the primary branch, but write work does NOT happen there"). The reset that caused the loss was itself correct behavior — local `<primary-branch>` had legitimately drifted behind `origin/<primary-branch>`; realigning it via `reset --hard` is exactly the sanctioned recovery path. The problem was having uncommitted tracked changes present at that moment, not the reset. +- **Prevention.** Start every write session with `git worktree add .claude/worktrees/<slug> -b <branch-name>`. The worktree's working tree is insulated from resets or pulls that target the root checkout. The "untracked files survive" quirk of `git reset --hard` is an accident of its scope, not a design to rely on — worktrees provide actual isolation. + +## Exceptions + +- **Emergency realignment** of local `main` via `git reset --hard origin/main`. Two modes (see §What NOT to do for full detail): *safe sync* when local `main` has no divergent commits (no approval needed); *destructive realignment* when it does and you're dropping them (requires explicit user approval). Either way, a reset is not a commit and does not violate "no commits to local main." +- **Rescue branches** created via `git branch rescue/<name> <sha>` before destructive operations. These are safety pointers, not work branches. Clean up when the rescue is no longer needed. +- **User-directed overrides.** Any rule can be waived for a specific operation if the user says so explicitly. The invariants resume as soon as the override is complete. diff --git a/.claude/skills/handoff/SKILL.md b/.claude/skills/handoff/SKILL.md new file mode 100644 index 00000000..521c1e2b --- /dev/null +++ b/.claude/skills/handoff/SKILL.md @@ -0,0 +1,191 @@ +--- +name: handoff +description: Use when context is about to be lost — approaching auto-compaction, ending a long session, wrapping a multi-agent coordination cycle, before dispatching a follow-up agent who won't share hot context, or when the user asks for a "handoff" / "checkpoint" / "where are we" / "session summary" / "what's left". +--- + +# Handoff + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here. + +## Overview + +Context built during a substantial work session costs hours of agent time to reconstruct; writing it down costs minutes. A handoff is the act of capturing that context into durable artifacts BEFORE it evaporates — compaction, session end, fresh-agent dispatch, whatever triggers the loss. + +**Core principles (two asymmetries):** + +1. **Cheap to document, expensive-to-impossible to reconstruct.** Hot context is a non-renewable resource. Anything worth putting in a status report to the user is worth putting in a durable artifact first — a handoff doc, a living plan, a coordination log, a pitfalls entry, an outstanding-items doc. The status report to the user is ephemeral; the artifact is persistent. Write the artifact; let the status report reference it. + +2. **Review is cheap, mistakes in handoffs are expensive.** A review round that finds nothing costs ~10 minutes of agent time. A handoff that ships with an undocumented seam, a stale plan banner, or a missing follow-up can cost downstream readers 30+ minutes each to reconstruct, multiplied across every future dispatch that touches the gap. The asymmetry favors more review, not less. Err on the side of an extra round when any doubt exists. + +## When to use + +- Session is approaching auto-compaction (high context usage) +- Ending any session that produced non-trivial state (decisions, discoveries, in-flight work) +- Wrapping a multi-agent coordination cycle — plans shipped, PRs opened, follow-ups queued +- Before dispatching a follow-up agent whose context will not include yours +- Human partner asks for a "handoff", "checkpoint", "where are we", "session summary", "what's left" +- Noticing that state is split across status reports, PR notes, and the session transcript but not fully in any one durable place + +## Core discipline + +A handoff MUST do five things. Skipping any one degrades the handoff into a status report. + +1. **Mine hot context at lossless detail.** The handoff author MUST make multiple passes through the session's recent work, explicitly fighting recency bias. Mid-session decisions, seams in half-shipped work, and "little follow-up to-dos" are the items that get lost — the items a status report would skim but a future agent will need. + +2. **Update every living artifact that is now stale.** Plans, design docs, coord logs, outstanding-items, pitfalls, skill files — any file that described state accurately BEFORE the session and no longer does MUST be updated to match reality. State MUST NOT live only in PR notes or status reports. + +3. **Create artifacts that don't exist yet but should.** A new followups doc, a new pitfall entry, a new design-decision record, a new parked-ideas entry — if the session produced durable material that no existing artifact covers, the handoff author MUST create the artifact rather than leaving the material in the handoff doc alone. + +4. **Identify seams.** Anywhere two pieces of work meet — a PR that was merged while another was rebasing, a deferred task whose upstream just shipped, a merge race between concurrent branches — MUST be explicitly documented. Seams are where context is silently lost between agents. + +5. **Run a minimum of 6 rounds of adversarial review on the handoff itself.** Five canonical perspectives plus at least one session-specific perspective the agent chooses based on what actually happened this session. Additional rounds are welcome. See §Adversarial review below. One-pass handoffs miss seams; multi-pass review from multiple perspectives catches them. + +## Process + +### Phase 1: Mine hot context + +Multiple explicit passes. Do not rely on a single scan. + +**Pass 1 — Recent decisions.** What decisions were made in the last hour of this session? Who made them, what was the rationale, what alternatives were considered? + +**Pass 2 — Mid-session (combat recency bias).** Scroll further back. What decisions were made 2-6 hours ago that haven't been referenced recently? These are the ones most likely to be lost. + +**Pass 3 — Little follow-up to-dos.** "Oh, and I should also..." items. "Worth capturing as a pitfall later." "Defer to a follow-up cycle." If you can remember saying it but don't see it in a committed artifact, it's a candidate. + +**Pass 4 — Seams between work units.** Where did one track hand off to another? Where did a merge race happen? Where did a gate open or close? Where did an agent's assumption turn out wrong? + +**Pass 5 — What a naive agent would need.** Read your own state from the perspective of a fresh agent who has none of your context. What glossary terms do they need? What file paths? What status at what commit? What's the next logical action and why? + +Each pass SHOULD produce items. If a pass produces zero, you aren't looking hard enough — scan again with a different lens. + +### Phase 2: Route to artifacts (not just the handoff doc) + +Everything mined in Phase 1 goes somewhere durable. The handoff doc is ONE destination, not the only one. Route each item: + +| Kind of content | Goes to | +|---|---| +| State that updates an existing plan (phase shipped, deferred, scope edited) | Plan's per-phase Execution Status banners + top-of-plan summary | +| Cross-agent coordination state (what shipped, merge SHAs, who owns what) | Project's coordination log (CHANGELOG, a dedicated coord-log doc, a section of a status doc — whatever the project uses) | +| Speculative thinking worth preserving but not committing to | Project's parked-ideas or backlog location | +| Newly-learned traps (implementation or testing pitfalls) | Project's known-issues / pitfalls / gotchas doc | +| Methodology insights worth codifying | Skill files (or a queue of skill-update candidates) | +| Everything else — session arc, priority queue, in-flight state, next actions | The handoff doc itself | + +Routing correctly keeps the handoff doc focused. A handoff doc that duplicates content living in the plan is noise; a handoff doc that POINTS at the plan and summarizes status is signal. + +### Phase 3: Write + +Write in this order: + +1. Update living artifacts first (plans, coord log, outstanding-items, pitfalls). +2. Create any new artifacts identified in Phase 2. +3. Write the handoff doc LAST, referencing the updated artifacts rather than duplicating their content. + +The handoff doc structure SHOULD include: + +- **Headline state** — branch, tip SHA, pushed?, worktrees live, PRs open +- **What shipped this session** — concrete artifact pointers, not narrative +- **In-flight work** — what's running, where, under whose ownership +- **Ready-to-dispatch** — queued work with prerequisites and where the prerequisites land +- **Not yet started** — items that have been scoped but not worked +- **Deferred items** — each with a semantic description of what needs to happen before the item is pickable + a link to the likely-unblocker artifact (its plan page, its task, its PR — whichever is authoritative per the project's Living Document Contract conventions). Prose condition + link is durable across paraphrases and scope edits; exact-string coordination across multiple agents is not. +- **Operational guardrails accumulated this session** — so a fresh agent doesn't re-discover them +- **Priority queue** — numbered, with dependencies +- **Continuation prompt** — paste-ready prompt for a fresh agent resuming the work + +### Phase 4: Adversarial review (minimum 6 rounds) + +A single-pass handoff author has blind spots the author cannot see. Five canonical perspectives plus one session-specific perspective find them. + +Run these rounds sequentially, documenting findings at each: + +**Round 1 — Naive fresh agent.** Would someone starting from zero context understand what to do? Where are the undefined jargon terms, assumed-context references, or missing glossary entries? Fix every instance. + +**Round 2 — Recency-bias audit.** Re-read with the assumption that recent items are over-represented. What mid-session items are under-documented? What hot-context decisions haven't made it into the handoff? Add them. + +**Round 3 — Seam auditor.** Where do two work units meet? Is the meeting point documented clearly enough that neither side's fresh-agent successor will be surprised? Look at: merge races, upstream-shipped-downstream-still-waiting transitions, cross-agent coord-log entries, rebases that absorbed changes from other branches, deferred-work references that depend on another agent's progress. + +**Round 4 — Operational guardrails auditor.** What operational rules did this session establish or reinforce? Commit discipline, branch rules, merge patterns, dispatch conventions. Are they in a durable place (CLAUDE.md, skill files, pitfalls) or did they only live in the session transcript? If the latter, persist them. + +**Round 5 — Loss-averse auditor.** What would a loss of hot context destroy that the handoff doesn't yet capture? What "oh by the way" items are still only in the transcript? Scan explicitly for the phrase "worth capturing later" or similar in-session markers. + +**Round 6 — Session-specific perspective (agent-chosen).** The canonical rounds 1-5 cover known-in-general failure modes. This session has its own character — security-heavy, perf-critical, cross-platform, methodology-novel, tooling-pioneering, something else — and that character has its own failure modes the canonical rounds won't catch. The agent MUST choose a perspective specifically relevant to what actually happened this session and review from it. + +Requirements for the Round 6 perspective choice: + +- MUST be a perspective not already covered by rounds 1-5. Don't repeat "seam auditor" with a different label. +- MUST be specifically relevant to THIS session — grounded in the session's content, not a generic auditor template. If the session shipped auth code, "security auditor" is legitimate; if the session was pure docs, it isn't. +- MUST be named and described explicitly in the handoff under a heading like `### Round 6 — [chosen perspective] — [N findings applied]` so future readers can see the reasoning. +- SHOULD be concrete enough to produce findings. "General quality pass" is too vague; "cross-platform failure modes I haven't tested on Linux yet" is actionable. + +If the agent genuinely cannot identify a session-specific perspective after trying, that itself is a finding — document "Round 6: no session-specific perspective identified; session content matches canonical rounds 1-5 adequately" with a one-sentence justification. Rare; default to finding one. + +**Additional rounds (7+) — encouraged by default.** 6 is the floor, not a ceiling. If the agent identifies any additional perspective that might catch issues rounds 1-6 didn't, the agent MAY (and often SHOULD) run further rounds. Review is cheap; a handoff mistake ships downstream reconstruction cost that compounds. Err toward an extra round. + +Rules for additional rounds: + +- Each additional round MUST be named + described explicitly like Round 6 — a stated lens that does work. The lens MAY be high-level (e.g., "read top-to-bottom with fresh eyes for overall coherence and framing") if the canonical rounds focused on specific angles and a holistic pass might catch structural issues. What makes a round legitimate is a stated lens, not a specific level of abstraction. +- Rounds MUST NOT be re-labeled duplicates of rounds already run. A Round 7 that's actually Round 3 with a different name doesn't count. Non-redundancy is the bar. + +Sessions that often reward extra rounds beyond the floor: multi-stream or multi-agent coordination cycles, security-sensitive work, technically complex work that crosses multiple layers or runtimes, handoffs into an agent that will operate with significantly reduced tooling or permissions than the current session, or any session where the agent has a nagging sense that something's still off. + +**Loop rule (applies to ALL rounds — canonical + additional).** If any round produces material findings, the agent MUST re-run every round in sequence after applying fixes. Fixes can surface issues that earlier rounds missed, or introduce new issues those rounds would have caught. Exit only when a full pass through every round (1-6 canonical + any additional ones the agent elected to run) produces zero material findings. The cost of an extra clean-pass sweep is cheap; the cost of a handoff shipped with a silently-broken invariant is expensive. + +## Red flags (STOP) + +These mean the handoff is not yet complete: + +- "The PR notes cover it" — PR notes disappear from context for anyone not looking at that specific PR. Move it to the handoff or plan. +- "I'll add it if someone asks" — They won't ask; they'll reconstruct wrong. +- "The commit messages have it" — Commit messages rot into archaeology. Not a substitute. +- "The user already saw this in chat" — User context is also ephemeral. Not a substitute. +- "The plan is accurate enough" — Run the per-phase banner check. If any phase shipped or deferred without its banner being updated, the plan is not accurate enough. +- "Only the headlines matter" — The "little follow-up to-dos" are precisely what gets lost. Headlines aren't enough. +- "One pass is fine" — Single-pass handoffs miss seams. Run 6 rounds including the session-specific one. +- "The canonical rounds covered everything" — They cover known-in-general failure modes, not this session's specific character. Round 6 exists because sessions differ. +- "I'll capture it at the end" — By the end you've forgotten the mid-session discoveries. Capture as you go or re-mine hot context in Phase 1. + +## Common rationalizations (rebuttals) + +| Rationalization | Reality | +|---|---| +| "The handoff is getting long" | Length is not the problem; missing content is. A handoff that captures everything beats one that loses a deferral condition or coordination seam, regardless of line count. Multi-hour sessions routinely produce handoffs well over 1,000 lines — that's fine when each line is earning its place. Trim only when content is redundant, never because the doc "feels big." | +| "This is my final session anyway" | Other agents read handoffs too. And future-you is a different agent. | +| "I'll just tell the next agent verbally" | You won't be there. The next agent will start cold. | +| "Review rounds slow me down" | They do. They also catch seams that cost hours to reconstruct later. ~10 min of review beats 30+ min of downstream archaeology — the asymmetry is ~3x and compounds. | +| "Status report to the user IS the handoff" | No. The user's chat context is ephemeral. Durable artifacts are the handoff. Status report references them. | +| "I already updated the plan" | Did you update ALL the plans that this session touched? Coord log? Outstanding-items? Pitfalls? Usually at least one is missed. | + +## Checklist + +Before declaring the handoff complete, verify: + +- [ ] Phase 1 mining pass produced items at each of the 5 lenses (recent, mid-session, little follow-ups, seams, naive-agent) +- [ ] Every living artifact this session touched has been updated to match current reality +- [ ] Any new durable artifact that should exist (but didn't) has been created +- [ ] Each deferred item has a prose description of its unblock condition + a link to the likely-unblocker artifact (plan, task, PR). No exact-string gate-key coordination — semantic description + live link is resilient to paraphrase and scope change; exact strings break on either. +- [ ] The handoff doc points at updated artifacts rather than duplicating their content +- [ ] The continuation prompt is paste-ready and self-contained +- [ ] At least 6 adversarial review rounds complete (5 canonical + at least 1 agent-chosen session-specific; additional session-specific rounds run as judgment suggested); the final full pass through every round run produced zero material findings +- [ ] Every session-specific round (Round 6 and any 7+ the agent elected to run) is documented by name in the handoff with its findings count; perspective choices are specific to this session's content, not generic templates or re-labels of canonical rounds +- [ ] The handoff is committed to a durable location (not just a chat message) + +## Social proof + +Observed across multi-session coordination cycles: handoffs written with per-phase plan banners + deferred-item prose conditions + route-to-the-right-artifact discipline reduce downstream dispatch prompts from lengthy "figure out what's done" archaeology sessions to short pointers ("see plan.md Phase N banner — upstream condition now holds — execute"). The cost asymmetry favors upstream documentation heavily and compounds across every subsequent dispatch that consumes the handoff. + +Handoffs written without that discipline create the opposite: state scattered across PR notes, commit messages, and session transcripts, with each downstream agent paying the reconstruction cost anew. The compounding works both directions. + +## Related conventions + +- **Plan banner format.** When Phase 2 routing updates a plan that follows a Living Document Contract (per-phase ✅/🚧/⏸/⬜ Execution Status banners plus a top-of-plan summary table), the handoff author MUST preserve that format when writing new banner content. If the project uses `/writing-plans-enhanced` or an equivalent convention for plan structure, that convention governs the shape of plan updates made during handoff; this skill does not redefine it. + +- **Canonical coordination log.** Each project SHOULD designate ONE location for cross-agent coordination state (CHANGELOG, a dedicated coord-log doc, a section of a status doc — whatever the project uses). Phase 2 routing sends cross-agent state there. Handoffs that route to whichever location is canonical for the project stay greppable; handoffs that invent new locations fragment the record. + +## The bottom line + +The handoff is the session's proof of work for the next agent. Hot context costs hours to build and minutes to preserve. Mine lossy, route everywhere it belongs, update what's stale, review adversarially, and commit. + +If a future agent reconstructs state you already knew, the handoff failed. If they resume in 2 minutes instead of 30, it succeeded. diff --git a/.claude/skills/performance-audit-cycle/SKILL.md b/.claude/skills/performance-audit-cycle/SKILL.md new file mode 100644 index 00000000..a40a8a4c --- /dev/null +++ b/.claude/skills/performance-audit-cycle/SKILL.md @@ -0,0 +1,153 @@ +--- +name: performance-audit-cycle +description: Full performance audit cycle — dispatch the sibling performance-audit skill (parallel perf lanes + execution-cost map), cross-validate findings against real code and hot-path reachability, present decisions, and write a fix plan via writing-plans-enhanced with a measurement/verification gate. Use before scaling work, when chasing latency/throughput/resource regressions, or for an audit-and-fix loop rather than just a snapshot. +argument-hint: "<scope, e.g. 'the request pipeline', 'PR 45', 'src/render/'>" +--- + +# Performance Audit Cycle + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here. + +## Overview + +Running a full performance audit cycle for: **$ARGUMENTS** + +This is a multi-phase workflow. The runner MUST follow each phase in order and MUST NOT skip phases. + +This skill orchestrates sibling workhorses in this plugin: [`performance-audit`](../performance-audit/SKILL.md) for the parallel lane dispatch + synthesis, [`writing-plans-enhanced`](../writing-plans-enhanced/SKILL.md) for the fix plan, and [`plan-review-cycle`](../plan-review-cycle/SKILL.md) for the adversarial plan review. The cycle owns scope research, cross-validation, optional dynamic confirmation, the user-decision loop, and performance-specific plan instructions. It MUST NOT duplicate the subagent-proofing, TDD, Living Document Contract, or plan-review discipline encoded in the delegated skills. + +### Scope validation + +If `$ARGUMENTS` is empty or unclear, the runner MUST ask the user for a scope before Phase 1. Useful shapes: a request/render path, a feature, a directory/package, a PR number, a commit range. The runner MUST NOT guess a scope or default to "everything" — performance lanes perform best on a precise, bounded surface. + +**Whole-repo / oversized scope.** If the user DOES want "everything" — a whole repository, a top-level directory/package set, "all of `<X>`", or any surface materially larger than one run is optimized for (roughly >4k production LOC, or spanning more than one language/package) — do NOT cram it into one run and do NOT silently audit only part of it. Instead follow [`whole-repo-scoping.md`](whole-repo-scoping.md), **starting at its size-router (its first routing step)**, which sub-routes between a lightweight pass (small / ≤2 slices: survey, eyeball, run, 3-line ledger, no formal review round) and the full survey-through-execute + review-gate method (a repo / multi-package / larger surface / >2 languages). The method does the rest: survey + measure production LOC, build a cheap hot-path/reachability map, cut the code into bounded language-homogeneous slices by perf-relevance, calibrate cross-slice call frequency, assign depth tiers (full / reduced / cold sweep(s) / overlay), **adversarially review the partition before executing**, then run this cycle once per slice with a persistent progress ledger. That method turns "audit the whole repo" into a reviewed plan of bounded runs that collectively cover all the code. + +--- + +## Phase 1 — Research scope + +Determine what code falls within **$ARGUMENTS** and give the audit a precise, actionable scope. + +- **Phase/feature:** check `docs/plans/` for a matching plan; `git log --oneline` for the commits; `git diff --stat` for the file list. +- **PR:** the changed files + commit range. +- **Directory/package:** list the files directly. + +Produce a **scope summary** — files/packages, a one-paragraph description of what the code does, the realistic load it sees (request rates, data sizes, concurrency), and any known performance context. Identify **adjacent code** (shared utilities, hot callers) the lanes should be aware of. Realistic load matters: it's how the lanes calibrate Impact (reachability × frequency × per-occurrence cost). + +--- + +## Phase 2 — Dispatch performance-audit + +The runner MUST invoke the sibling [`performance-audit`](../performance-audit/SKILL.md) skill (or, if the framework cannot invoke skills by name, read its `SKILL.md` from the plugin install location), passing the scope summary + adjacent context. Follow it through Phase 0 (detection), Phase 1 (currency brief), Phase 2 (lane dispatch), and Phase 3 (synthesis). This produces raw per-lane reports + a consolidated report (and, if suspected bugs were found, a bug-hunt kickoff prompt) under `docs/perf-audits/`. + +**The runner MUST NOT proceed until all dispatched lanes complete and the consolidated report is written.** + +--- + +## Phase 3 — Cross-validate every finding + +Audit lanes are adversarial and produce false positives, mischaracterize impact, and sometimes flag intentional tradeoffs. Every finding needs verification. + +**COMPLETENESS REQUIREMENT:** The runner MUST account for every single finding from every lane report (not just the synthesis — lanes may carry findings the synthesis merged or missed). Before validating, enumerate all findings. Every finding MUST appear in the validated report as one of: confirmed / design decision / false positive / out-of-scope. **The runner MUST NOT decide what's "too minor" to include — that's the user's decision in Phase 5.** Silently dropping findings defeats the audit. + +For each finding: +1. **Read the actual code** at the cited location. Verify the evidence yourself. +2. **Confirm hot-path reachability** — is the code actually reached under the realistic load from Phase 1? An impressive-looking quadratic over input that's always tiny is not a real finding. Re-rank Impact if the lane over- or under-stated reachability. +3. **Check plan/design/pitfalls docs** — is this an intentional, documented tradeoff? +4. **Verify the impact claim** — is the cost real and on the aggregate-cost path? Cross-reference the Execution Cost Map. +5. **Cross-lane validation** — agreement across lanes strengthens; single-lane findings get extra scrutiny. + +Classify each: **Confirmed** · **Design decision needing user input** · **False positive** (explain why) · **Out of scope / pre-existing** (still document). + +**Blast-radius analysis** for confirmed findings: what else calls this code; would the fix change an API/signature affecting consumers; ordering dependencies; could the optimization alter observable behavior (a correctness risk)? + +Write `docs/perf-audits/<date>-<slug>-validated.md` (Confirmed / Design decisions / False positives / Out-of-scope sections, each finding carrying the finding-model fields + blast radius). **COMPLETENESS CHECK:** re-read every lane report; confirmed + design + false-positive + out-of-scope MUST be ≥ the total unique findings. Add any missing. + +--- + +## Phase 4 — Optional dynamic validation + +If the environment can build and run the project AND a real workload exists (or can be defensibly constructed — never invent load), the runner SHOULD measure the worst confirmed findings to confirm or refute them before presenting. Measurement upgrades a finding's Confidence to `Measured`. If the project isn't runnable or no honest workload exists, skip this phase and state why in the validated report. The runner MUST NOT fabricate benchmark numbers. + +--- + +## Phase 5 — Present to user + +Present the validated findings. Structure: + +1. **Executive summary** — X confirmed (N critical / N major / N minor), Y design decisions, Z false positives, W out-of-scope. Include the **regression delta** from the run metadata (vs the prior same-scope run: N new / N persisting / N resolved) and name the new and resolved findings — that's the trend signal the user most wants. +2. **Confirmed findings** — table (title, impact rank, location, on-cost-map, effort as work magnitude). **The runner MUST NOT omit minors.** The user prioritizes, not the runner. +3. **Execution Cost Map highlights** — the likely time-concentration regions, for architectural awareness. +4. **Design decisions** — each with enough context for an informed call; recommend where you have a well-reasoned opinion. +5. **Out-of-scope findings with larger blast radius** — include in fix plan, or document for later? +6. **Suspected bugs** — note the appendix exists and that a `bug-hunt-cycle` kickoff prompt is ready (suggest running it; do not auto-invoke). +7. **Scope question for the fix plan** — the default is **ALL confirmed findings** (see Phase 6 disposition discipline). Ask the user only which, if any, they want to *opt out*, and surface any agent-recommended substantive deferrals for their decision. + +**The runner MUST wait for the user's input on design decisions and opt-outs before Phase 6.** + +--- + +## Phase 6 — Write fix plan + +After user input, the runner MUST invoke [`writing-plans-enhanced`](../writing-plans-enhanced/SKILL.md) to create the implementation plan. That skill owns subagent-proofing, TDD, pitfall review, cross-task conflict minimization, and the Living Document Contract — the cycle MUST NOT duplicate them. The runner MUST pass these **performance-specific instructions** to layer on top: + +- **Plan file path:** `docs/plans/<date>-<slug>-perf-audit-remediation-plan.md`. The `-perf-audit-remediation-plan.md` suffix distinguishes these from bug-hunt / health-review plans. +- **Source:** the validated findings report at `docs/perf-audits/<date>-<slug>-validated.md`. +- **Traceability + self-contained task titles:** each task MUST cite its originating finding ID (`P1`, `P2`, …) **as a suffix for traceability**, but the task title and description MUST stand on their own — describe what / where / why (e.g. "Batch line-item catalog lookups in `enrich_line_items` — one DB round-trip instead of one per item [perf finding P3]"), never just "Fix P3" or "address the `data-access` lane". This discipline carries into the resulting commit messages, PR text, and code comments. See `finding-model.md` "Referring to findings". +- **Verification gate — every task MUST include:** + - a **baseline** captured *before* the change — a measurement OR an explicit complexity/allocation argument; + - a **post-change demonstration** that it improved — a measurement OR argument; **if it does not improve, revert the change**; + - a **correctness guard** — existing tests pass + a test pinning the behavior the optimization must preserve (per TDD; consult `testing-anti-patterns` so the guard tests real behavior, not mocks). +- **No severity-based deferral (disposition discipline, per `finding-model.md`):** every finding's default disposition is **FIX**. The plan MUST schedule **all** findings by default. A finding may be dropped only when the **user explicitly opted it out** (Phase 5) or the agent gives a **substantive reason naming a specific concrete mechanism** (the exact refactor it collides with; the exact out-of-scope dependency bump; the specific correctness regression + why it outweighs the gain). "Minor / low-priority / might be risky / could be complex" is **forbidden** as a deferral rationale. Deferred items go in the Deferred appendix (below) with their named mechanism or the user's opt-out. +- **Counter over-optimization:** specify the minimum change per task; state what NOT to touch. Performance tasks tempt wholesale rewrites. +- **Advisory:** after remediation, run the auto-generated bug-hunt kickoff over the diff — performance changes are a classic bug source. + +When `writing-plans-enhanced` presents execution options, the runner MUST recommend one with reasoning (context consumed, self-containment, parallelizable vs sequential tasks, risk). + +### Deferred items appendix + +If any findings are deferred, the plan MUST include: + +```markdown +## Appendix: Findings Identified But Not Fixed in This Cycle +### <Title> (finding <Pn>) +**Impact:** <rank> **Location:** <file:line> +**Why deferred:** <user opt-out OR the specific named mechanism — refactor/dependency/regression> +**Recommended approach:** <brief fix for when this is addressed> +``` + +This appendix is the persistent record — written to the plan file, never left in conversation memory. + +--- + +## Phase 7 — Plan review cycle + +Before committing, the runner MUST review the fix plan for subagent-readiness by invoking [`plan-review-cycle`](../plan-review-cycle/SKILL.md). That skill owns the multi-round adversarial review; the cycle MUST NOT duplicate it. After it completes, the runner SHOULD log plan-quality observations to the project's pattern store (key `plan-review-<slug>`). + +--- + +## Phase 8 — Commit reports + +The runner MUST stage and commit all performance audit cycle artifacts: + +```bash +git add docs/perf-audits/<date>-<slug>-* +git add docs/perf-audits/runs.jsonl # the run ledger (historical/regression substrate) +git add docs/perf-audits/cache/ # if a currency brief was refreshed +git add docs/plans/<plan-file> # if the plan was written +git commit -m "docs(perf): <slug> — validated findings and fix plan" +``` + +--- + +## Phase 9 — Whole-repo / multi-slice roll-up (conditional) + +When this cycle was run **once per slice** as part of a whole-repo plan (per [`whole-repo-scoping.md`](whole-repo-scoping.md)), add **one cross-slice roll-up after the last slice**. It is the single highest-value artifact of a whole-repo run: it turns N per-slice reports into systemic themes a per-unit view can't see — e.g. a transport-write-buffering theme spread across four protocol slices is invisible in any one report but obvious across them. + +- **Inputs:** `docs/perf-audits/runs.jsonl` (the run ledger) + every slice's consolidated report. +- **Output:** `docs/perf-audits/<date>-WHOLE-REPO-ROLLUP.md` — shared root causes / repo-wide themes grouped *across* slices, a prioritized cross-slice fix list, and a heat map (slice × tier × severity). **Surface any `frequency-unresolved — assume-hot` findings** (from the method's cross-slice calibration) for the operator to confirm reachability, rather than letting an unverified assume-hot finding ship top-ranked. +- **When:** conditionally REQUIRED when the request was a *posture* question ("how's the repo's performance?"); optional when it was "find and queue fixes". For a **service monorepo** it is **two-level** — a per-service synthesis, then a cross-service meta-roll-up. + +This does not re-audit; it synthesizes already-committed slice reports. Commit it like the slice reports. diff --git a/.claude/skills/performance-audit-cycle/whole-repo-scoping.md b/.claude/skills/performance-audit-cycle/whole-repo-scoping.md new file mode 100644 index 00000000..7404ba76 --- /dev/null +++ b/.claude/skills/performance-audit-cycle/whole-repo-scoping.md @@ -0,0 +1,384 @@ +# Whole-Repo / Oversized-Scope Slicing (method) + +**Load this when:** the requested audit scope is a whole repository, a top-level +directory/package set, "everything", "all of `<X>`", or any surface materially +larger than one `performance-audit-cycle` run is optimized for. A single cycle's +lanes perform best on a **precise, bounded, perf-relevant** surface (one coherent +subsystem — see [Sizing](#sizing-how-big-is-one-slice)). This method turns "audit the whole thing" into a +**reviewed partition** of bounded slices, each fed to its own cycle run, that +collectively cover all the code — avoiding both naïve failure modes: mega-runs +(lane precision collapses on huge scope) and over-fragmentation (Impact +mis-calibrates when a finding's caller lives in another slice). + +> **Provenance / how to read this.** Distilled from a real whole-repo application +> (a ~96k-LOC Rust+TypeScript desktop app) hardened over **five adversarial +> rounds**, then generalized across ecosystems via three further reviews +> (generalizability, robustness, followability). The skeleton — survey → cheap +> hot-path/reachability map → slice into coherent bounded units → cross-slice +> frequency calibration → depth tiers → **review-gate-before-spend** — is the +> durable, ecosystem-agnostic contribution. Numbers and examples are calibrated +> to that case and carry explicit ecosystem-scaling rules; tune them, don't copy +> them blindly. *[case]* marks lessons from the worked example. + +--- + +## TL;DR — minimal ordered checklist + +1. **Route by size.** Named bounded scope → run the cycle directly. ≤2 slices / small + → **lightweight path** (survey + eyeball + run + 3-line ledger, no gate). Else → + full method. +2. **One program, or many?** Monorepo of deployables → one plan+ledger *per + deployable*; shared libs audited once. +3. **Survey & measure production LOC.** Production LOC per unit (`tokei` minus the exclude table); + record raw→prod delta. +4. **Map the hot paths & reachability.** Classify workload (CPU / IO / event-driven); run the HOT/WARM/COLD + checklist; "no hot path" is valid. +5. **Cut the slices.** Coherent subsystems; the split/keep rule; prefer fewer/larger; + complete the disjoint-coverage ledger. +6. **Calibrate cross-slice frequency.** Only for a hot symbol whose caller is in another slice — + ≤1-page frequency map; unknown caller → assume-hot. +7. **Assign depth tiers & verification modes.** HOT→FULL, WARM→REDUCED, COLD→sweep(s), cross-slice→OVERLAY. +8. **Review the partition before executing.** Review depth scales with slice count; skip for 1–2 slices. +9. **Order, persist, execute.** Hottest first; commit per slice; ledger = + resumable. + +--- + +## Route by size (three-way, don't over-ceremony the small case) + +- **Precise bounded scope** (user named a request path / module / package that + fits one run): skip this method — run the cycle directly. +- **Lightweight path** — PRIMARY gate **≤2 natural slices**; LOC is a secondary, + language-scaled check (~<8k for verbose ecosystems, ~<4k for dense ones — see + [Sizing](#sizing-how-big-is-one-slice)) across ≤2 languages: do the survey & measure step, eyeball the 1–2 slices, + run the cycle on each, a 3-line ledger, **no formal review round** (a 5-minute + self-check against the heuristics table is enough). Only build a frequency map + (the cross-slice frequency calibration) if a hot impl's caller sits in the other slice. **Self-check the cross-slice frequency blind + spot:** confirm no hot symbol's frequency is driven from the other slice; if it + is, build the ≤1-page frequency map even on the lightweight path. +- **Full method** — a repo / multi-package / >~8k LOC / >2 languages: the full survey-through-execute method + the + review gate, with review depth scaled by slice count (see the gate). + +## One program, or many? (do this before partitioning) +If the repo holds **multiple deployable units** (a Go monorepo of services, a +multi-module Gradle build, a .NET solution of many `.csproj`, an Nx/Turborepo of +apps), the audit unit is the **deployable service/app, not the repo**. Produce a +service inventory and run a **separate slice plan + coverage ledger + run +history per service**. **Shared libraries** consumed by several services are +audited **once** as their own slice and *referenced* by each service's ledger +(marked `shared` — neither re-sliced per consumer nor dropped). Cross-*service* +frequency is set over the network (see the cross-slice frequency calibration). Only after this do you partition +within a single program. + +- **.NET caveat:** a `.csproj` is usually a **library, not a deployable** — the + deployable is the **entry-point project** (Web/Worker/Api); the rest are shared + libs (audit once, reference). Do not produce one "service" partition per + `.csproj`. +- **Same axis at two scales:** the one-program-or-many step (the deployable split) and the + process-boundary split (the one-primary-ecosystem principle) are the **same axis at two scales** — a + backend+SPA in one repo is a *process boundary* handled by the one-primary-ecosystem principle, **not** a + per-service split. +- **Data/ML repos (notebooks, pipelines):** the audit unit is the **DAG stage / + pipeline step**, not a package or notebook; the hot path is a dataframe/Spark op + or a data-loader; size by **stage + data volume**, not LOC band; partition along + **DAG-stage seams** (cell/stage execution order replaces the call graph). + +--- + +## Survey & measure production LOC (measure the real surface) + +Enumerate before slicing: +- **Build units** (packages/crates/modules/services) from manifests; **languages/ + ecosystems** per area (decides profile packs + lanes). +- **Size on PRODUCTION LOC, not raw LOC** — the #1 trap *[case: a 9.1k-LOC + "module" was 4.5k production; raw-LOC sizing produced a 2× too-granular + partition]*. **How to measure concretely:** + - Baseline with a tool: `tokei --output json` / `scc` per directory. + - **Exclude** tests, generated, vendored, fixtures, non-code. Tells by ecosystem: + + | Ecosystem | Exclude (tests / generated / vendored) | + |---|---| + | Rust | inline `#[cfg(test)]` spans, `tests/`, `benches/`, `target/` | + | Go | `*_test.go`, `*.pb.go`/`*_gen.go` + `// Code generated` banner, `vendor/` | + | Python | `tests/`, `test_*.py`/`*_test.py`, `conftest.py`, `__pycache__/`, `migrations/`, `*_pb2.py`, `.venv/` | + | JS/TS | `*.test.*`/`*.spec.*`, `__tests__`, `*.stories.*`, snapshot dirs, `*.d.ts` gen, `dist/`/`build/`, `node_modules/` | + | Java/Kotlin | `src/test/`, generated sources dir (incl. generated gRPC/proto stubs), `build/` | + | C#/.NET | `*.Tests`, `obj/`/`bin/`, `*.Designer.cs`, `*.g.cs`, `*.AssemblyInfo.cs`, EF Core `Migrations/`, generated gRPC/proto stubs | + - Subtract inline-test line spans (they inflate same-file counts); detect + generated code by header banner (`@generated`, `Code generated … DO NOT EDIT`). + - **Record raw→production delta per unit** (the ratio is non-uniform — 0.1×–6.9× + observed *[case]*; Python/Ruby skew low, Go/Java skew high with test+gen). +- Output a **survey table**: unit → language → production LOC → one-line purpose. + +## Map the hot paths & reachability (cheap, structural) + +**First classify the workload shape — it changes what "hot" means:** +- **CPU-bound / real-time** (desktop, games, codecs, data kernels, DSP): hot path + = inner loops, allocation, per-frame/per-message/callback handlers. Grep for the + loop / hot kernel / real-time callback. +- **IO-bound services** (web, RPC, most microservices): hot path = **DB + round-trips, N+1 / unbatched queries, cache misses, external-call fan-out, + serialization** — sized by request/throughput rate, **NOT** inner loops. Grep + for ORM access in handlers, query-in-loop, `await` fan-out, missing batching. +- **Event-driven / serverless**: entry points live in **config/IaC** (queue/cron/ + HTTP bindings — `serverless.yml`, SAM, function manifests), not the call graph. + Read the manifest to find entry points + their frequency (queue rate, cron). + +Then map where work concentrates **and** classify the calibration hazards: +- **Cold glue** — CRUD, IPC/DTO marshalling, config, string assembly, form + rendering; **JVM/.NET add** DI wiring, annotation glue, getters/mappers (a LOT + of it → the COLD SWEEP is *more* valuable there). Batch it. +- **Latent / dead code** — no live callers; findings are *reachability ≈ 0 today*, + flag "fires once wired in" *[case: a codec crate had zero callers; the live path + bypassed it]*. **Detection (cheap):** grep call sites + imports + manifest + wiring. **CONFIRM before flagging dead** — dynamic dispatch, trait objects, FFI, + plugins, **and especially framework wiring (routers, DI containers, + `@Scheduled`/`@EventListener`/Celery/Sidekiq task names, signals, webhooks, + reflection, cloud event bindings (queue/cron/HTTP triggers declared in IaC), + Python import-time registries (decorators only wire if the module is imported — + import graph ≠ call graph))** defeat grep. "No in-tree caller" is NOT dead in + dynamic/DI/serverless code → treat as **LIVE-uncertain** until you've checked the + framework's wiring. +- **External-process boundaries** — work done in a child process / DB / cache / + queue / GPU / remote service. The audited code there is **I/O + orchestration**, + not the compute → reduced tier *[case: a "DSP" module was TCP plumbing to an + external TNC; the web analogue: an ORM call is orchestration — the query plan + runs in Postgres, so read the query, not just the Python]*. +- **VERIFY hot-path hypotheses against code; never infer from names** *[case: a + "waterfall UI" had no canvas/`requestAnimationFrame` — it was ordinary React]*. +- **"No hot path" is a valid outcome.** A uniformly-flat CRUD app legitimately + partitions into mostly COLD SWEEPS — state that and move on; don't manufacture + an imaginary hot path. + +**Hot / warm / cold checklist (apply per candidate slice):** +A slice is **HOT** if ANY: it sits on the request/render/frame/message path AND +contains a loop/allocation/query that scales with load; it's a real-time/ +deadline path; it's IO-bound with N+1/fan-out under load. **WARM** if it's on a +live path but with bounded/low-frequency work, or a secondary/occasional path. +**COLD** if it's setup/config/glue/CRUD with no load-scaling work. +**Tie-breaker:** if you can't find the loop/handler/query that makes it hot, it is +**not** hot → default **WARM** (never silently assume hot or cold). + +**Slice-tier vs finding-frequency axes (don't conflate them):** these are different axes — an +unverified-hot **slice** is tiered WARM by the hot/warm/cold tie-breaker; a confirmed-hot +**finding** whose cross-slice **frequency** is unresolved is ranked assume-hot by the +frequency fail-safe. Don't apply the frequency fail-safe's optimistic-Impact rule to a whole slice's tier. + +## Cut the slices (principles + crisp rules) + +1. **One primary ecosystem per slice — keep embedded languages with their + driver.** A slice has ONE primary pack (its lanes / idiom index). Embedded + second languages *driven by* the primary code — SQL in an ORM/query layer, a + shader, an inline regex/template — stay **in the slice as adjacent context** + (run the SQL/HTML sub-pack as a sub-lane); do **not** carve them into a separate + slice that would be split from their caller (that is a cross-slice impl/caller split you + *induced*). Carve a separate-language slice only at a **real process/deploy + boundary** (UI↔backend IPC, service↔service, app↔external engine). For a polyglot + *feature* spanning a process boundary, prefer an **OVERLAY** to recover the + end-to-end cost rather than pretending per-language slices capture it. *[case: + Rust↔TS there was a real IPC boundary, so "never mix" happened to coincide with + a seam; in a Django/Spring service the languages interleave in one call stack — + splitting SQL from its Python/Java driver would fracture one perf story.]* +2. **Coherent subsystem + shared data flow** per slice (one pipeline stage / one + feature / one service-triplet), not an arbitrary chunk. +3. **Size to the sweet-spot, by build-unit first, LOC as a sanity check** (see + [Sizing](#sizing-how-big-is-one-slice)). Split larger along **real module/file seams** (name the files per + sub-slice). **God-file with no seams:** synthesize seams by symbol cluster / + call-graph community, run the pieces as an OVERLAY family, and flag the file + itself as a maintainability finding. +4. **Slice by perf-relevance, not raw size** — carve genuine hot paths out; pull a + warm exception out of a cold bucket; batch the rest. **Perf-relevance overrides + LOC for merge/split:** never merge away a hot slice because it's small — an + IO-orchestration layer is small by LOC but large by Impact. +5. **Complete, disjoint coverage** — every code unit in **exactly one** slice; + maintain a coverage ledger reconciled against an actual file listing; list + **out-of-scope** explicitly (tests, `bin/` probes, generated). **OVERLAYs are + analysis-only passes, NOT coverage units** — they do NOT appear in the + disjoint-coverage ledger (their member slices already do) and do not emit a + `runs.jsonl` regression line the same way. **Generated-but- + hot exception:** if generated code is genuinely on a hot path (a generated + parser/codec/serializer), audit it FULL, tag `generated-source`, and target the + **generator/template**, not the emitted file. *[case: a Rust command module was + orphaned by a name collision with a same-named frontend dir; only the ledger + caught it.]* + +**The split/keep rule** (replaces prose judgment): **SPLIT** a +candidate iff its two halves have *different hot-path character* OR *different +primary ecosystems* OR it exceeds the sized band (see [Sizing](#sizing-how-big-is-one-slice)) with a real seam. **KEEP +together** iff they share a data flow AND a frequency driver AND fit the band. +**Tie-breaker: prefer fewer/larger** — over-fragmentation fails *silently* (cross-slice frequency +mis-rank), oversize fails *loudly* (the run tells you it's too big and you +re-slice once; see *Order, persist, execute*). + +## Sizing — how big is one slice? + +**The build-unit / coherent-subsystem is the PRIMARY sizer** — one package/crate/ +module/service-triplet/pipeline-stage, cut along real seams (per *Cut the slices*). Size by *what is +a coherent perf story*, not by hitting a LOC number. + +**Production-LOC band as a sanity check** (per-ecosystem, because verbosity +differs — a number that's "too big" in one ecosystem is mid-band in another): + +| Ecosystem | Per-slice production-LOC band (sanity check) | +|---|---| +| Python / Ruby | ~0.5–2k | +| Rust / TS | ~1–4k | +| Go / Java / Kotlin / C# | ~2–6k | +| C / C++ | by **translation unit**, not a flat band | + +If a build-unit lands outside its band, that's a prompt to look for a seam (split) +or a sibling to merge — not a hard rule. The band is the *check*; the build-unit +is the *sizer*. + +**Note:** "~100k LOC → ~10–20 units" is a **Rust/TS datapoint** *[case]* — count +**features/services, not lines**. Don't port that unit-count to a denser or more +verbose ecosystem without re-deriving it from the band above. + +## Calibrate cross-slice frequency (the subtle one; make it fail-safe) + +Impact = reachability × **frequency** × per-occurrence cost, and the frequency is +often set by a **caller in a different slice** (or **outside the codebase** — see +below). When impl and hot caller are split, the impl's slice can't see how often +it runs and **under-ranks** it. Make this **demand-driven, bounded, and +fail-safe** — never a global whole-program analysis: +- **Only trigger on a detected impl/caller split** (a slice's hot symbol whose + callers aren't in-slice). Do not build a global call map. +- **Bound the traversal:** stop at the first of {a nameable frequency *class* — + per-request / per-frame / per-message / loop-over-N / per-row}, an entry point, + or 3 caller frames. No infinite "what-calls-the-caller" regress. +- **Mitigate (cheapest first):** (a) a ≤1-page **frequency-map pre-artifact** + (`impl symbol → caller file:line → multiplier class → N`) handed to the affected + runs as adjacent context *[case: a compression routine's real driver was an + Outbox-loop in a cold-swept file; a one-page map fixed calibration without + re-tiering]*; (b) **order** runs so the frequency-establishing slice precedes the + impl slice; (c) **merge** the two if small and tightly coupled. +- **Out-of-tree frequency:** for services, frequency is set by request rate / queue + depth / cron cadence / fan-out / **inter-service network calls** (service A calls + B's endpoint N×/request — read API contracts/clients/tracing). Capture these in + the frequency map as first-class inputs (from load context / IaC), not just + in-tree counts. +- **Shared-substrate fan-in:** a shared lib called in-process by N units is a + many-to-one **fan-in** — calibrate its frequency by the **hottest caller** (the + union of caller frequency classes), and tier it by that. +- **Fail-safe:** if the caller is unknown or unaudited, tag the finding + `frequency-unresolved — assume hot` at **optimistic** Impact; the cycle's + Phase-3 cross-validation re-ranks it **if it can resolve the caller** — but if + the caller stays unresolved, **surface the finding in the roll-up for the + operator to confirm reachability**, rather than letting an unverified assume-hot + finding ship top-ranked. (Phase 3 re-reads cited code and re-ranks Impact by + reachability, so it demotes when it CAN reach the caller; the roll-up surface + covers the case where it can't.) Never silently under-rank a real one. + +## Assign depth tiers & verification modes + +- **FULL** (all phases, all core lanes) — HOT slices. +- **REDUCED** (algorithmic/memory/data-access/concurrency; skip idiom-currency/ + payload-startup unless flagged) — WARM slices. +- **COLD SWEEP** (one batched run, ~3 lanes: complexity + allocation + data-access) + over all COLD glue at once — coverage without waste. Batched **up to one run's + capacity (the [Sizing](#sizing-how-big-is-one-slice) band)** — cold glue exceeding + that gets **several** cold sweeps partitioned by build-unit/area, not one. The + economy is *fewer lanes per run*, not *unbounded LOC per run*. +- **OVERLAY** (analysis-only) — a hot pipeline spanning several slices; run after + its members. Same capacity caveat: an OVERLAY spanning more than one run's + capacity (the [Sizing](#sizing-how-big-is-one-slice) band) is split into several + overlay passes, not run as one oversized pass. + +**Map the hot/warm/cold checklist result → tier:** HOT→FULL, WARM→REDUCED, +COLD→the sweep, cross-slice-pipeline→OVERLAY. + +**Verification mode** per slice: can the environment build+run it (dynamic lane / +`Measured` confidence available), or is it static-only / **deferred**? Deferred +covers physical hardware (a device/rig) **and** "needs a load test / production- +like dataset / a staging service that doesn't exist locally." State it so +fix-plans rely on complexity/allocation arguments, **never fabricated numbers**, +where measurement isn't possible *[case: rig-timing findings were unfalsifiable +without radio hardware].* + +## Order, persist, execute (resumable) + +- **Execution order**: hottest first; frequency-establishers before their impl + slices; overlays after members; cold sweep last. Maintain an explicit + **slice-dependency list** (use the project's dep-graph mechanism — e.g. `bd dep` + edges — if it has one). +- **Persistent artifacts (mandatory — the job must survive a context reset / + ephemeral container):** + - **Slice plan** — the partition (per slice: paths, language, production LOC, + tier, verification mode, adjacent-context/frequency-map pointers) + coverage + ledger + out-of-scope list + **the planning commit SHA**. + - **Progress ledger** — a row per slice: `id | tier | scope paths | state + (PENDING/IN-PROGRESS/DONE/SKIPPED) | artifact paths`, plus a "how to resume" + header (read plan + ledger, pick first non-DONE). Commit it; update it per + slice. + - **Run ledger** (`runs.jsonl`) — one line per executed run, for regression. +- **Commit per slice** (consolidated report + ledger update). Never batch a repo's + worth of audit into one commit. +- **Coverage drift:** before each slice, confirm its paths still exist; at the end, + re-diff the coverage ledger against the **current** tree (vs the planning SHA) — + renamed/added files must be re-homed, not dropped. +- **Mid-execution mis-scope:** if a slice's own run reveals it's too big/small, + **re-slice that region at most once**, record it in the ledger, then proceed; if + still wrong, escalate to the user rather than thrash. +- **Repo-level roll-up:** after the slices, a short cross-slice synthesis (shared + root causes, repo-wide themes, heat map). **Conditionally REQUIRED** when the + request was a posture question ("how's the repo's performance?"); optional when + it was "find and queue fixes". For a **service monorepo** the roll-up is + **two-level** — a per-service synthesis, then a cross-service meta-roll-up. + +--- + +## Review the partition before executing — adversarially review the partition *before* executing runs + +The partition is itself a substantive artifact and a single pass misses +cross-slice defects *[case: four hot-path-hunting rounds converged "clean"; the +fifth, a **partition-design** lens, found a cross-slice calibration defect they +all missed]*. **Scale review depth to slice count — the 5-round case is the +CEILING, not the default:** + +| Slices | Review | +|---|---| +| 1–2 (lightweight) | none — 5-min self-check vs the heuristics table | +| 3–5 | 1 general round (fold the partition-design checklist into it) | +| 6–12 | 1 round, **partition-design lens REQUIRED** (its explicit job is cross-slice calibration, not finding-hunting) | +| 13+ / high-stakes | ≥2 rounds, ≥1 dedicated partition-design | + +Each reviewer attacks, grounded in actual code: sizing (production vs raw LOC); +hot-path accuracy (verify / refute imaginary / find missed); mis-tiered slices +(cold-as-warm, warm-as-cold, latent not flagged); **cross-slice frequency splits** +(the class a hot-path-only review reliably misses); coverage gaps / double-counts +/ language mis-bucketing; the fewer-larger vs finer-grained tradeoff. Revise +between rounds; finalize when a round finds only nits. + +## When in doubt +- **Can't tell if code is hot** (no visible entry point, dynamic dispatch) → WARM + + `frequency-unresolved`, let the run's cross-validation sort it; don't guess HOT/COLD. +- **Can't find a seam to split an oversize unit** → synthetic-seam OVERLAY family + + flag the file; don't drop or force one giant run. +- **User disagrees with a slice** → their call on scope; record it and re-slice. +- **Generated/dynamic makes coverage uncertain** → mark `coverage-uncertain` in the + ledger and surface it, rather than claiming false completeness. + +## Heuristics & anti-patterns (quick reference) + +| Trap | Rule | +|------|------| +| Size on raw LOC | Measure **production** LOC; non-uniform ratio; build-unit is the primary sizer. | +| One LOC band for all languages | Scale by verbosity; build-unit is the primary sizer (see [Sizing](#sizing-how-big-is-one-slice)). | +| Infer hot paths from names | **Verify against code**; classify workload shape first (CPU vs IO vs event-driven). | +| "Hot path = CPU loop" everywhere | For services it's DB/N+1/fan-out/serialization, sized by request rate. | +| "No in-tree caller = dead" | Not in dynamic/DI/serverless code — check framework wiring; else LIVE-uncertain. | +| "Never mix languages" absolutely | One primary pack; embedded langs stay with their driver; split only at process/deploy boundaries. | +| One mega-run / one-per-file | Coherent bounded subsystems; the split/keep rule with prefer-fewer tie-breaker. | +| Full cycle on cold glue | Batch into one COLD SWEEP. | +| Latent/external code ranked hot | reachability≈0 "fires once wired in" / external-process = orchestration → reduced. | +| Impl + caller in different slices | Demand-driven, bounded, fail-safe cross-slice frequency calibration (assume-hot on unknown). | +| Promise measurements you can't take | Tag verification mode (hardware OR load-test/staging deferred); complexity argument, never fake numbers. | +| Repo = the audit unit always | For service monorepos the unit is the deployable service; shared libs audited once. | +| Trust a single partition pass | Review depth scaled to slice count; ≥1 partition-design lens at 6+ slices. | + +## What this method produces (hand-off to the cycle) +A **reviewed slice plan** (ordered slices with {paths, language, production LOC, +tier, verification mode, frequency-map pointers}, coverage ledger, out-of-scope +list, planning SHA) + a progress ledger. Each FULL/REDUCED slice → a normal +`performance-audit-cycle` run; the COLD SWEEP → one trimmed `performance-audit` +run; OVERLAYS → analysis passes. The ledger makes the whole-repo job resumable. diff --git a/.claude/skills/performance-audit/README.md b/.claude/skills/performance-audit/README.md new file mode 100644 index 00000000..a6981f16 --- /dev/null +++ b/.claude/skills/performance-audit/README.md @@ -0,0 +1,191 @@ +# performance-audit — maintainer & contributor guide + +**If you are a future agent (or human) here to *extend or maintain* this skill, read this first.** +`SKILL.md` tells an agent how to *run* an audit; this README tells you how the skill is *built*, why +it is shaped the way it is, and how to change it without eroding what makes it work. When the two +disagree, `SKILL.md` wins for runtime behavior and `generic-pack.md` wins for pack-authoring mechanics +— this file orients and states the principles. + +The full rationale for every non-obvious decision lives in the **decisions log**: +[`docs/plans/2026-06-03-performance-audit-decisions-log.md`](../../../../docs/plans/2026-06-03-performance-audit-decisions-log.md) +(Parts A–Z). When you make a substantive change, append to it — that log is how a future you +reconstructs *why*, not just *what*. + +--- + +## What this skill is, in one breath + +A critical, **multi-dimensional** performance review. It detects the stack + versions, loads the right +durable *lenses* (profile packs) and version facts, dispatches **independent lane agents in parallel** +(one per performance dimension), and synthesizes a ranked, calibrated report — no praise, no grades, +just problems with impact. It is a *snapshot*; the sibling `performance-audit-cycle` adds the +verify→decide→remediate loop. + +The eight lanes (slugs): `algorithmic`, `memory`, `data-access`, `concurrency`, `idiom-currency`, +`cost-map` (a map, not findings), `payload-startup` (conditional), `dynamic` (optional, measured). + +--- + +## Guiding principles + +These are load-bearing. Most of the skill's quality comes from holding them; most ways to degrade it +are quiet violations of them. (`generic-pack.md` holds the **canonical, expanded** form of the +pack-authoring principles — edit there and let this list follow; the version here is the orientation +digest, deliberately shorter.) + +- **A lens should sharpen a clever agent, not constrain a strong one.** Every pack is a *reference*, + not a checklist — a **prior, not a worklist; a floor, not a ceiling.** It names what is known to be + worth knowing; it is never the boundary of what is worth finding. The consumer-side framing in + `lane-prompts.md` says exactly this to every lane agent ("if you are a stronger model than the lens + was written for, out-reason it"). Keep the producer side honest too: never write a bullet that boxes + in a better judgment. +- **Write for a reader who may be smarter than the author.** As models strengthen they need *less* + hand-holding on durable fundamentals — so the durable pack is the **most skippable** layer for a + strong model and must degrade gracefully. Encode the *condition* and the *trade-off*; let the agent + decide. Do not encode "do exactly X" prescriptions. +- **Calibration governs *generation*, not post-hoc suppression.** Lanes are told what is NOT a finding + (cold-path micro-nits, style, theoretical big-O on bounded n) so they don't pad — but a surfaced + finding is never dropped as "too minor"; that is the user's call. See `finding-model.md`. +- **Adversarial, not sycophantic.** Lanes find problems; they MUST NOT open with "performance is + generally fine", grade, or soften. (Exception: the `cost-map` lane is descriptive.) +- **Three-tier knowledge, strictly separated.** *Durable* idioms → the **profile pack**. + *Version-pinned* fast-paths/defaults → the **version index** (`version-indexes/<eco>.md`). + *Post-cutoff recency* → the **currency brief** (per-run, see `currency-protocol.md`). Never bake a + version-specific claim into a pack; tag any concrete API/default with "(verify against the currency + brief for your version)". This separation is the real future-proofing: the durable layer stays lean + (what a capable model already knows) while the index/brief carry the **unknowable** facts no model + can self-supply. Weight shifts pack→index/brief as models improve. +- **One point per bullet; length justified by reasoning, not enumeration.** A bullet that lists five + sub-conditions has become a checklist; a bullet that explains one condition and when it does/doesn't + matter is a reference. ~5–9 bullets per lane section. **A mediocre bullet is worse than an omitted + one.** +- **Materiality decides the load, not mere presence.** A module loads when its tech is *central* to + the scope — a stray `import json` / `encoding/json` does not pull in the serialization module. +- **Detection is scoped to the audit scope, not the whole repo.** In a monorepo, walk up from the + scoped files to the nearest governing manifest(s). +- **Pursue durable accuracy.** A wrong-but-confident bullet is worse than none. New packs/modules are + written by research agents and then **reviewed for accuracy by the integrator before they ship**. + +--- + +## Architecture & files + +``` +SKILL.md ← runtime spec: phases 0–3 (detect → currency → parallel dispatch → synthesize) +lane-prompts.md ← the verbatim per-lane dispatch prompts + the shared preamble (the "reference, + not a checklist" framing lives in the shared preamble — highest-leverage text) +finding-model.md ← Impact×Confidence scoring, Effort-as-magnitude, calibration, disposition +currency-protocol.md← how the version-aware currency brief is researched and cached per run +run-schema.md ← versioned run metadata + ledger + finding fingerprints (regression analysis) +profile-packs/ ← the lenses (this is where most maintenance happens) + generic-pack.md ← always-loaded language-agnostic baseline + the canonical "How to add a pack" guide + <ecosystem>.md ← core pack: lane-keyed sections; LARGER/deep-dived ecosystems also add a runtime- + notes section + a module map (see "pack structure" below); smaller ones + (rust/jvm/swift) are a single lane-keyed file with neither — that's fine + <ecosystem>/<module>.md ← load-on-detection deep lenses (web, ORM, RPC, data, caching, …) + sql.md (+ sql/) ← a CROSS-CUTTING companion pack (loads alongside a language pack) for hand-SQL +version-indexes/ ← build-once "API/feature → version → perf benefit" lookups (+ README.md) +test-fixtures/ ← fixtures for validating lane behavior +``` + +### The pack structure you must preserve + +- **A profile pack is lane-keyed.** Its top-level sections use the same lane slugs as + `generic-pack.md`, because the dispatcher pastes *each lane's slice* into *that lane's* agent. Keep + the headings aligned or slices won't route. +- **Core + load-on-detection modules (large ecosystems).** A large pack's core file holds the + always-loaded lanes + a **runtime-notes section** (the durable engine/runtime realities that cut + across every lane — the GIL, V8 hidden classes, Go's GC/GOMAXPROCS). The exact heading varies by + ecosystem and is *the same role under different names*: `## Runtime notes` in Go/Python/JS-TS, + **`## Variant notes`** in `.NET` (its Modern-vs-Framework split, the original name), and + **`## Reading the plan & schema`** in SQL. Tech-specific depth lives in + `profile-packs/<eco>/<module>.md`, selected by a **`## Framework / sub-stack modules (load on + detection)`** map (a `signals → module file` table). A run pastes the core + only the modules whose + signals are *material* to the scope. **Smaller ecosystems** (`rust`, `jvm`, `swift`) are a single + lane-keyed file with no runtime-notes section and no modules — split only when a pack accretes enough + tech-specific bulk to warrant it. +- **Two ways to that structure, same end state** (decisions log Parts T, W, X): + **"relocate"** when the core already carries inline framework bloat (move it out + deepen — .NET, + JS/TS); **"deepen"** when the core is already clean (keep it as quick-hits, add deeper modules — + Python, Go). +- **SQL is special: a content-detected *companion* pack.** It is not selected by a manifest; load + `sql.md` *alongside* the language pack whenever hand-written SQL is material, plus a dialect module + (`sql/postgres.md` / `sql/tsql.md`). Its core has a "Reading the plan & schema" section (its Runtime- + notes analog) and a **"Routines"** section — because the most expensive hand-rolled SQL hides in + stored-procedure / function / trigger bodies invoked by name, easy to miss when reading app code. + +--- + +## How to make common changes + +**Add an ecosystem pack** → follow the canonical numbered steps in `generic-pack.md` +("How to add a profile pack"): lane-keyed core with the same headings, durable-only bullets, the +density/one-point rule, a runtime-notes section *if the ecosystem has cross-cutting runtime realities +worth stating* (a small pack can skip it, like rust/jvm/swift), register detection in `SKILL.md` +Phase 0, build a `version-indexes/<eco>.md` for the version-pinned facts. Add a `## Sources` appendix. +Only split into modules once the pack accretes enough tech-specific bulk to warrant it. + +**Add a sub-stack module** → create `profile-packs/<eco>/<module>.md` as a standalone +`# <Ecosystem> performance module: <Tech>` doc with a load-when banner pointing at the core map; add a +row to the core pack's module map; keep it durable, tight, verify-tagged, and **do not restate the +core lanes** — a module *deepens*. + +**Add version-pinned facts** → put them in `version-indexes/<eco>.md` (not the pack). Bump +`covered_through` only when you've actually reviewed that far; partial coverage misrepresents the +index. See `version-indexes/README.md`. + +**The proven workflow** (used to build the Go/Python/JS-TS/SQL passes): dispatch **parallel research +agents** (each writes one module to its own file — no write conflicts), give each the format reference ++ the density contract + the durable-only rule, then **review every module for accuracy yourself** +before wiring it into the map. Then run a **multi-perspective adversarial review** (≥3 rounds, distinct +lenses: checklist-vs-reference, accuracy, false-positive calibration, coverage, structure) and record +APPLIED/REJECTED findings in the decisions log. Commit and push frequently — the container is +ephemeral; losing work is the only expensive outcome. + +**Validate a change** → `test-fixtures/` holds the evals (see [`test-fixtures/README.md`](test-fixtures/README.md)). +Two kinds: **behavioural/discipline tests** (`test-fixtures/behavioral/` — ecosystem-independent RED/GREEN +scenarios that test the machinery: reference-not-checklist, materiality, calibration, bug-no-chase, +wall-clock-ban) and **per-ecosystem recall/precision fixtures** (`<eco>-sample/` — a small app with +planted issues + an `expected-findings.md` rubric). They are **manual, re-runnable, on-demand evals** +(dispatch a lane subagent against a fixture, score recall/precision), **not a CI gate**. The rubric +deliberately includes a **"beyond-the-pack" issue** the agent must reason to (not pattern-match a +bullet) — finding it rewards out-reasoning the lens; *consistently* missing it across runs is the +warning sign that a pack has drifted toward a checklist. Add a fixture per *ecosystem*, not per module +(a matrix rots and tunes packs into checklists — see decisions log Part Z/DD). + +--- + +## Conventions + +- **Verify-tag** every concrete API/default/version claim in a pack: `(verify against the currency + brief for your version)`. +- **Banners**: a module's line 2 is `> Load when <signals> is detected — see the module map in + `../<eco>.md`. …this file is the <Tech> lens only.` +- **Naming**: refer to lanes by slug/name, never bare number ("the `data-access` lane", not "Lane 3"). +- **Reference discipline extends to NEW content** (an easy trap when drafting a multi-phase method or + doc): the same rule `finding-model.md` enforces for findings — *never a bare opaque label as the sole + referent* — applies to **authored skill content too**. Give phases/steps/sections **descriptive, + self-contained titles** and cross-reference them by those titles or by anchor links, never by opaque + codes (`S0/S0.5/S4`, "the §2 tie-breaker"). A reader landing mid-doc must understand the reference + without decoding a private numbering scheme. (Caught in real use: a drafted scoping method used + `S#` phase codes as cross-refs — exactly what the finding-reference rule forbids.) +- **Decisions log discipline**: every substantive call gets an entry (perspective(s) considered, + options, the choice, and APPLIED/NOTED/REJECTED dispositions). This is the single most useful thing + for a future maintainer — it is how intent survives context loss. +- **Commits**: small, frequent, descriptive; develop on the assigned branch; open a PR only when asked. + +--- + +## Where to look when extending + +- The **design doc**: [`docs/plans/2026-06-03-performance-audit-design.md`](../../../../docs/plans/2026-06-03-performance-audit-design.md) +- The **decisions log** (Parts A–Z): the running rationale — read the parts touching the area you're + changing before you change it. +- `generic-pack.md`: the authoritative pack-authoring guide and the "references, not checklists" + invariant. +- `lane-prompts.md` shared preamble: the highest-leverage text in the skill — it is what keeps a + strong consuming model from treating any pack as a checklist or a ceiling. Touch it with care. +- [`feedback-template.md`](feedback-template.md): a hand-off-ready template + instructions to give an + agent running the skill against a real repo, so field use produces high-signal feedback (blind + discovery, honest non-findings, named workarounds, where-it-would-change pointers). The improvements + in decisions-log Part FF all came from one such field run — keep feeding this loop. diff --git a/.claude/skills/performance-audit/SKILL.md b/.claude/skills/performance-audit/SKILL.md new file mode 100644 index 00000000..a077dfa9 --- /dev/null +++ b/.claude/skills/performance-audit/SKILL.md @@ -0,0 +1,248 @@ +--- +name: performance-audit +description: Run a critical, multi-dimensional performance review with parallel agents across algorithmic complexity, memory/allocation, data access & I/O, concurrency, framework-idiom currency, payload/startup, and an execution-cost map. Use as a performance snapshot, before scaling or optimization work, or when investigating slowness, latency, throughput, or resource usage. +argument-hint: "[optional: specific area/path to focus on, or 'full']" +--- + +# Performance Audit + +## Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119] [RFC 8174] when, and only when, they appear in all capitals, as shown here. + +## Overview + +Running a critical performance review of the project. + +Focus: **$ARGUMENTS** (default: full review across all applicable dimensions) + +This skill detects the stack and version, builds (or reuses) version-specific performance guidance, dispatches independent performance lanes in parallel, and synthesizes a ranked, calibrated report. It is independently invocable as a **snapshot** (no remediation). For the full audit→verify→decide→remediate loop, use the sibling `performance-audit-cycle`. + +Companion references (read as needed, one level deep): +- [`finding-model.md`](finding-model.md) — how findings are scored, calibration, the disposition discipline. +- [`currency-protocol.md`](currency-protocol.md) — version-aware currency-brief research + cache. +- [`lane-prompts.md`](lane-prompts.md) — the verbatim dispatch prompts for every lane. +- [`profile-packs/`](profile-packs/) — per-ecosystem lane lenses (`generic-pack.md` is the always-loaded fallback). +- [`run-schema.md`](run-schema.md) — versioned run metadata + ledger + finding fingerprints for historical/regression analysis. +- [`version-indexes/`](version-indexes/) — shipped, build-once "API/feature → version → perf benefit" lookups; the `idiom-currency` lane consults these before any live research. + +--- + +## Philosophy + +This is an adversarial review. Each lane agent's job is to find performance problems in its dimension — not to praise what's working. + +**Anti-sycophancy rules for all lanes:** +- Lanes MUST NOT open with "performance is generally fine" or soften findings. +- Lanes MUST NOT give scores or grades — just report problems (with impact). +- Lanes MUST NOT pad with cold-path micro-nits. Calibration (`finding-model.md`) governs what to generate; without it, lanes pad. +- If a lane genuinely finds nothing significant, it MUST say "No significant findings" and explain in one sentence what it examined. + +**Exception — the Execution Cost Map (`cost-map`) lane is descriptive, not adversarial.** It produces a map of likely time-concentration, not a problem list, and MUST NOT manufacture problems to fill it. + +--- + +## Phase 0 — Stack & version detection + +Detect languages, frameworks, **and exact versions** from manifests: + +| Manifest | Ecosystem / signals | +|---|---| +| `package.json` + lockfile | Node/JS/TS; React/Angular/Vue/Next versions | +| `pyproject.toml` / `requirements*.txt` / `poetry.lock` | Python; Django/Flask/FastAPI/SQLAlchemy/pandas | +| `go.mod` | Go + module versions | +| `Cargo.toml` / `Cargo.lock` | Rust + crate versions | +| `*.csproj` / `packages.config` / `Directory.Packages.props` | .NET — **modern** (TFM `net8.0`+, `<PackageReference>`) vs **Framework** (TFM `net4x`, `packages.config`) | +| `pom.xml` / `build.gradle(.kts)` | JVM (Java/Kotlin); Spring/Hibernate | +| `Package.swift` / `*.xcodeproj` / `*.xcworkspace` / `Podfile` | Swift; SwiftUI/UIKit, Core Data/SwiftData, SwiftPM/Vapor | +| `Gemfile.lock` / `composer.lock` | (generic fallback) | +| Hand-written SQL — `*.sql`, migration/stored-proc files, embedded query strings, schema DDL | **SQL companion pack** (loads *alongside* the language pack) + dialect: PostgreSQL vs T-SQL/SQL Server | +| HTML documents — `*.html`/`*.htm`, server templates (`*.erb`/`*.jinja`/`*.twig`/`*.blade.php`/`*.cshtml`/`*.njk`), static-site output, `<!DOCTYPE html>` markup | **HTML companion pack** (loads *alongside* the backend that emits the markup) + modules: images-media, fonts | + +**The SQL companion pack** (`sql.md`) is **content-detected, not manifest-detected**: load it in addition to the application's language pack whenever hand-written SQL (not just ORM calls) is *material* to the scope — raw queries, views, stored procedures, functions, triggers, or migrations — and load the matching dialect module (`sql/postgres.md` or `sql/tsql.md`) from the database driver/DSN or dialect syntax. The SQL pack reasons best when the **schema/DDL is in scope** (indexes, types, keys); note reduced confidence when it is not. ORM-generated SQL is covered by the language packs' data modules instead. **Follow routine invocations into their definitions:** an `EXEC`/`CALL`/proc-name reference (or DML on a triggered table) in the application code points at hand-rolled SQL whose body lives in a schema/migration file — pull those routine/trigger definitions into scope and audit their bodies, or the most expensive hand-rolled SQL stays invisible (see `sql.md` "Routines"). + +**The HTML companion pack** (`html.md`) is likewise **content-detected**: load it *alongside* the backend pack whenever rendered HTML markup (static, server-templated, or a JS framework's HTML output) is material to the scope — it owns the **document/rendering/delivery** layer (critical rendering path, render-blocking resources, DOM size, compression/caching, Core Web Vitals) that exists even with little or no JavaScript. Load `html/images-media.md` when the page carries significant imagery/embeds and `html/fonts.md` when it uses web fonts. The **JS bundle** itself (tree-shaking, code-splitting, transpile target) stays with the JS/TS `bundling-build` module — `html.md` is the markup/delivery layer, not the bundler. + +**Detection is scoped to the audit scope, not the whole repo.** In a monorepo the root manifest can misrepresent the area under audit — the runner walks up from the scoped files to the nearest governing manifest(s) and profiles *those*. A `full` audit profiles all of them. + +Output a **stack profile** (`{ecosystem, framework, version}` tuples + source layout). It selects which profile pack(s) to load and seeds the currency brief. If detection is ambiguous or polyglot, load every matching pack plus `generic-pack.md` and note reduced specificity for unmatched parts. + +**Sub-stack modules:** if a matched pack carries a `## Framework / sub-stack modules (load on detection)` map (`dotnet.md`, `go.md`, `python.md`, `javascript-typescript.md`, and `rust.md` all do), load the **core** pack for the project plus only the `<ecosystem>/<module>.md` files whose detection signals appear in the audit scope (e.g. load `dotnet/sql-server-data.md` only when EF/`SqlClient`/Dapper is present; `go/grpc.md` only when `google.golang.org/grpc`/`.proto` is present; `python/orm-database.md` only when Django ORM/SQLAlchemy/psycopg is present; `javascript-typescript/react.md` only when React/JSX is present; `rust/web.md` only when axum/actix-web/hyper is present, `rust/data-parallelism.md` only when rayon/polars is present). This keeps each run pasting only the relevant tech lenses, not the whole pack. Load a module when its technology is **material to the audit scope**, not on an incidental or transitive import — a lone `import json` / `import asyncio` (Python) or a stray `encoding/json` (Go) that is peripheral to the scoped code does not by itself warrant the serialization or async module; load it when that technology is *central* to the code under audit (the scope is serialization-heavy, or built on asyncio). Detection selects *candidates*; materiality decides the load. + +--- + +## Phase 1 — Currency brief (anti-stale-training) + +Follow [`currency-protocol.md`](currency-protocol.md). In brief, per detected framework: + +0. **Shipped version index first (no network):** if `version-indexes/<ecosystem>.md` exists, it covers version-specific perf knowledge up to its `covered_through`; the live steps below only extend past that. This keeps version-history mining a build-once cost, not a per-run one. +1. **Cheap, best-effort** registry check (1-day TTL) for the latest published version. Failure fails *soft* — never blocks the audit. +2. **Reuse** the cached brief at `docs/perf-audits/cache/<ecosystem>/<framework>@<major.minor>.md` if the in-use version matches, no newer version has appeared, and the 180-day fallback hasn't elapsed. +3. Otherwise **refresh** via live web research and rewrite the cache (with sources). +4. **Offline** → emit "currency brief unavailable"; `idiom-currency` findings are LOW confidence; never fabricate version-specific claims. + +The brief is passed to every lane. The consolidated report MUST record which brief (and its `researched_on` date) it used. + +--- + +## Phase 2 — Parallel lane dispatch + +The runner MUST dispatch the lanes as **independent, concurrent agents** (embarrassingly parallel — they share no mutable state; packs and the brief are read-only inputs). Read [`lane-prompts.md`](lane-prompts.md) and, for each lane, paste the shared preamble + that lane's body, filling placeholders with the scope, the matched profile-pack slice for that lane — the lane-keyed section of the core pack, **plus the core pack's cross-cutting Runtime/Variant-notes section** (and any companion pack's equivalent — SQL's *Reading the plan & schema*, HTML's *Rendering path & Core Web Vitals*), which is shared context that applies to every lane, **plus** any loaded sub-stack modules relevant to the lane, per Phase 0 — the currency brief, and the output file path. Each agent MUST write its raw report to `docs/perf-audits/` **immediately** on completion (persist-before-synthesis) and also return findings for consolidation. + +**Before dispatch, ensure the artifact paths exist** — create `docs/perf-audits/` (and `docs/perf-audits/cache/`) and an empty `docs/perf-audits/runs.jsonl` if absent. They are referenced by `git add` in Phase 8 and written by the lanes; on a first run they won't pre-exist. + +**Two equivalent dispatch modes — paste the slice, or have the lane read its own.** Both must deliver each lane the same lens (its lane-keyed slice **+** the cross-cutting Runtime/Variant-notes section **+** the Phase-0 modules + the brief): +- **Runner pastes the slice** (the description above) — best when the runner already holds the packs in context and the lane count is small. +- **Lane reads its own slice** (first-class — and the **common** case when lanes are dispatched as subagents that do *not* share the runner's skill registry, so they can't invoke this skill by name). The runner instead tells each lane the exact files to read for itself from this skill's install location — `…/performance-audit/profile-packs/<ecosystem>.md` + the relevant `<ecosystem>/<module>.md` + the matching `version-indexes/<ecosystem>.md` — passing only the scope, the brief (or its path), and the output path. This avoids the runner holding and re-pasting every pack across 6–8 lanes (a real context cost at scale). Reading-its-own-slice is **not** licence to walk the whole pack as a checklist — the reference-not-checklist rule in the preamble still governs. + +### Lanes + +| Lane | id | Run? | +|------|----|------| +| Algorithmic complexity & data structures | `algorithmic` | always | +| Memory & allocation | `memory` | always | +| Data access & I/O | `data-access` | always | +| Concurrency & parallelization | `concurrency` | always | +| Framework-idiom currency | `idiom-currency` | always (uses brief) | +| Execution Cost Map (a map, not findings) | `cost-map` | always | +| Payload / startup / build | `payload-startup` | conditional — only when the stack has such a surface (frontend / serverless / CLI / mobile) | +| Dynamic profiling & benchmarking | `dynamic` | optional — only when the env can build+run AND a real workload exists/can be defensibly built (never invent load) | + +The six core lanes (`algorithmic`, `memory`, `data-access`, `concurrency`, `idiom-currency`, `cost-map`) always run **for a standalone, bounded (FULL-depth) audit** — the default. The runner MUST decide `payload-startup` and `dynamic` from the stack profile and environment, and MUST state in the report which lanes ran and why any were skipped. **Refer to lanes by these names, never by bare number** — "Lane 4" is meaningless outside this skill (see Rules). + +### Depth tiers (reduced & cold-sweep invocations) + +A FULL audit runs the six core lanes. When this skill is invoked as one slice of a tiered whole-repo plan (see the cycle's [`whole-repo-scoping.md`](../performance-audit-cycle/whole-repo-scoping.md)), it MAY run a **reduced lane subset** matched to the slice's tier — coverage without waste on warm/cold code: + +| Tier | Lanes to run | When | +|------|--------------|------| +| **FULL** | all six core (+ `payload-startup`/`dynamic` as applicable) | HOT slices; any standalone bounded audit | +| **REDUCED** | `algorithmic`, `memory`, `data-access`, `concurrency` (+ `idiom-currency` only where a framework/library idiom surface exists; `cost-map` optional) | WARM slices — live but bounded/low-frequency work | +| **COLD SWEEP** | one batched run, ~3 lanes: `algorithmic`, `memory`, `data-access` | batched cold glue (CRUD/config/DTO marshalling) — many units in one run | + +A reduced or cold-sweep run is a **deliberate, recorded** choice, not silent lane-skipping: the report MUST state the tier and which lanes ran and why. Reduced depth is *not* a licence to under-calibrate — calibration still governs generation, and a warm slice can legitimately come back all-minor (that's the model working, not the depth failing). + +### Agent model selection + +Each subagent SHOULD be invoked using the **latest available Claude Opus model** or **GPT-5 (or successor) at x-high reasoning effort**, unless the user has explicitly instructed otherwise for this run. Performance analysis benefits asymmetrically from maximum reasoning bandwidth, and saving model cost trades poorly against missed regressions that ship to production. If the framework requires a model parameter on dispatch, set it; if it inherits the parent's model, ensure the parent is on the strongest tier before dispatching. + +**Record the request honestly, not a guessed identity.** Some harnesses let you set the subagent *model* but expose **no reasoning-effort knob** (the Claude Code Agent tool, for one). When you can't actually request x-high, record `reasoning_effort: "default (harness exposes no knob)"` in the run metadata rather than claiming x-high — the metadata captures what was *requested*, and an honest "default" is correct where the dial doesn't exist. + +The runner MUST wait for all dispatched lanes to complete before Phase 3. + +--- + +## Phase 3 — Synthesis + +After all lanes complete, compile one consolidated report: + +1. **Deduplicate** across lanes — cross-lane agreement is a **confidence signal, not redundancy**: note which lanes flagged each and **lead the report with the most-agreed findings**. High overlap on a small hot core (the same hot symbol seen through several lane framings — algorithmic "defeats the cache", memory "per-call alloc", idiom-currency "library fast-path") is *expected*, not noise; collapse it to one finding with one fingerprint and record the agreement count. +2. **Rank** by the finding model (`finding-model.md`): Impact × Confidence, Effort sequencing within bands. +3. **Cross-reference the Execution Cost Map** — a finding on a mapped hot region gets its Impact confirmed; one in cold territory is down-weighted (state, per finding, whether it intersects the map). +4. **Group** cross-cutting root causes. +5. **Measurability note** — note whether the identified hot paths can be *observed* in production (metrics/traces present, or would confirming the win require adding instrumentation first?). Flag findings that can't be measured post-fix. +6. **Merge** every lane's "Suspected Bugs" sections into one Suspected Bugs appendix and, if any exist, **auto-write the bug-hunt kickoff prompt** (below). +7. **Capture run metadata** per [`run-schema.md`](run-schema.md): assign a fingerprint to every finding, emit the versioned frontmatter on the report, append one record to `docs/perf-audits/runs.jsonl`, and compute the regression diff against the most recent prior run for the same scope (new / persisting / resolved). Call out **new** and **resolved** findings in the executive summary — that's the regression signal. + +**Lanes may correct the scope brief — adopt the code-grounded value.** A lane's reading of the actual code is primary over the scope summary's load/frequency claims (the shared preamble says so). When a lane corrects a load assumption from source (a re-render briefed as ~1 Hz read as 4 Hz; a "DSP real-time" path found to be batch and off the audio callback), the synthesis MUST adopt the corrected value, re-rank accordingly, and **record the correction** in the report (frontmatter or summary). A wrong scope brief must not silently survive into the ranking — this is what makes the audit robust to an imperfect scope summary. + +The runner MUST account for every finding from every lane in the consolidated report. The runner MUST NOT drop a surfaced finding as "too minor" — that is the user's call (in the cycle). Calibration governs *generation*, not post-hoc suppression. + +**Dispatch lanes blind.** Give the lanes load/scope context only — **not** a list of already-suspected findings. Feeding lanes the answer measures confirmation; withholding it measures *discovery* (in real-world use, blind lanes reproduced a 5-round review's entire hot-path map **and** added findings it missed). The only thing pre-seeded as adjacent context is a *descriptive* frequency/hot-path map when cross-slice calibration needs it (per the cycle's whole-repo method) — never the conclusions. + +### Consolidated report format + +Save raw per-lane reports immediately (`docs/perf-audits/<date>T<HH-MM>-<slug>-<lane>.md`), then: + +```markdown +--- +<run-schema.md frontmatter block — run_schema_version, run_id, date, methodology, + dispatch (model_requested + reasoning_effort), stack, currency_briefs, lanes_run, + finding_counts, regression> +--- +# Performance Audit — <Scope> +**Date:** YYYY-MM-DD HH:MM **Scope:** <full | area> +**Stack:** <ecosystem/framework@version …> +**Currency brief:** <which brief(s), researched_on dates, or "offline"> +**Lanes run:** <list; note any skipped + why> +**Regression vs <prev_run_id|none>:** <N new, N persisting, N resolved> — new/resolved listed below + +## Critical Findings +### P1. <title> +**Lanes:** <which flagged it> **Location:** <file:line or pattern> +**Fingerprint:** `<lane-id>:<file>:<symbol>:<title-slug>` (e.g. `data-access:inventory.py:enrich_line_items:n-plus-1`) **Status:** <new|persisting|resolved> +**Problem:** … **Impact:** <reachability × frequency × per-occurrence cost> +**Confidence:** <Measured|Strong-static|Heuristic> **On cost map:** <yes/no> +**Effort:** <Localized|Contained|Cross-cutting> +**Verification plan:** <benchmark/argument + correctness guard> + +## Major Findings +… +## Minor Findings +… +## Cross-Cutting Themes +… +## Measurability +<can these hot paths be observed in prod? what needs instrumentation?> + +## Execution Cost Map +> Architectural awareness, NOT an optimization to-do list. +### Likely time-concentration regions +- **<region>** — basis: <structural reasoning> — confidence: <High|Med|Low> — <map-only | also Pn> +### Notes for architecture +- … + +## Suspected Bugs (for follow-up — NOT addressed here) +> Correctness bugs noticed during the audit. This audit does not fix or chase them. +> Run bug-hunt-cycle; a ready-to-use kickoff prompt is at +> docs/perf-audits/<date>-<slug>-bug-hunt-kickoff.md. +### SB1. <title> +**Location:** <file:line> **What looks wrong:** … **Why suspected:** … +``` + +Consolidated file: `docs/perf-audits/<date>T<HH-MM>-<slug>-consolidated.md`. + +### Bug-hunt kickoff prompt (auto-written when suspected bugs exist) + +If the Suspected Bugs appendix is non-empty, the runner MUST write `docs/perf-audits/<date>-<slug>-bug-hunt-kickoff.md` containing a paste-ready prompt, and MUST suggest the user run it — but MUST NOT auto-invoke `bug-hunt-cycle`. Template: + +```markdown +# Bug-hunt kickoff — suspected bugs from the <date> performance audit + +Run: `bug-hunt-cycle` with the scope below. + +**Scope:** <the files containing the suspected bugs + one-paragraph context per area> + +**Seed findings (verify, don't trust — surfaced incidentally during a perf audit):** +- <SB1 title> — <file:line> — <what looks wrong, why> +- <SB2 …> + +These were noticed while auditing performance and were NOT investigated. Treat them +as leads for the hunters, not confirmed bugs. +``` + +If there are no suspected bugs, write "None" in the appendix and skip the kickoff file. + +--- + +## Artifacts + +- Per-lane raw: `docs/perf-audits/<date>T<HH-MM>-<slug>-<lane>.md` +- Consolidated: `docs/perf-audits/<date>T<HH-MM>-<slug>-consolidated.md` (with versioned frontmatter per `run-schema.md`) +- Run ledger: `docs/perf-audits/runs.jsonl` (one appended record per run — the regression/trend substrate) +- Bug-hunt kickoff (if any): `docs/perf-audits/<date>-<slug>-bug-hunt-kickoff.md` +- Currency cache: `docs/perf-audits/cache/<ecosystem>/<framework>@<major.minor>.md` + +The runner MUST save each raw lane report as soon as that lane completes — MUST NOT wait for synthesis — so analysis survives interruption. + +--- + +## Rules + +- Lanes MUST read **actual source code**, not just `CLAUDE.md` / `AGENTS.md`. +- Findings MUST be **actionable** and carry the full finding model (Impact/Confidence/Effort/Verification). +- Effort MUST be expressed as work magnitude, never wall-clock (see `finding-model.md`). +- The runner MUST dispatch all applicable lanes; dropping lanes breaks the independence primitive that makes the review work. +- The runner MUST NOT inflate minors to look thorough, nor downgrade criticals to avoid alarm. +- Correctness bugs are recorded and handed off, never chased here. +- **Write for readers without your context.** Lane names and finding IDs (`P1`, fingerprints) are internal scaffolding. In any outward-facing text — commit messages, PR titles/bodies, code comments, remediation-plan task titles, questions to the user — describe the finding in self-contained terms (what / where / why); never use a bare lane name/number or ID as the sole referent ("addresses the `concurrency` lane" / "fixes P3" is meaningless to others). The ID may be appended as a traceability suffix only. See `finding-model.md` "Referring to findings". diff --git a/.claude/skills/performance-audit/currency-protocol.md b/.claude/skills/performance-audit/currency-protocol.md new file mode 100644 index 00000000..cc73b124 --- /dev/null +++ b/.claude/skills/performance-audit/currency-protocol.md @@ -0,0 +1,114 @@ +# Currency Protocol (anti-stale-training) + +**Load this when:** running Phase 1 of `performance-audit` — building or reusing the version-specific +performance guidance ("currency brief") for a detected framework. + +## Contents +- Why this exists +- The version-aware refresh logic +- Registry commands per ecosystem +- Cache file location + format +- Offline / failure degrade + +--- + +## Why this exists + +LLM training data ages. Two failure modes this protocol counters: + +- **(a) Old-fast-now-slow** — recommending a pattern that was fast in an older framework version but + regressed or was deprecated in a newer one. +- **(b) Missed new fast-path** — not knowing about a performance API/feature/default added after the + bulk of training data. + +The brief is a small, sourced, **repo-local** cache of version-specific performance facts that +the `idiom-currency` lane (framework-idiom currency) consults. It lives in the *target repo*, not the plugin, so plugin +updates never wipe it and a team accrues + shares it via git. + +## The version-aware refresh logic + +The expensive operation is *researching* perf implications; the cheap operations are *consulting the +shipped version index* and *asking the registry what the latest version is*. The protocol exploits +that asymmetry — it does not use a flat calendar TTL as the primary trigger, and it does not +re-research a whole version history at runtime when a build-once index already covers it. + +0. **Shipped version index first (no network).** If `version-indexes/<ecosystem>.md` exists (see + `version-indexes/README.md`), it is the primary source of version-specific perf knowledge up to its + `covered_through` version — consult it before any network call. Live research (steps 1–4) then only + needs to **extend past `covered_through`** (or runs in full only when no index exists for the + ecosystem). This keeps the expensive version-history mining a build-once cost, not a per-run one. + +1. **Cheap currency check (1-day TTL), best-effort.** For each detected framework, make one registry + call for the latest published version (table below) and record `latest_available`. If the + registry is unreachable or the tool isn't installed, the check **fails soft**: fall through to the + cached brief if one exists (flag it possibly-stale) and otherwise to offline-degrade. The check + MUST NOT block or fail the audit. + +2. **Cache lookup** at `docs/perf-audits/cache/<ecosystem>/<framework>@<major.minor>.md`. + +3. **Reuse the cached brief** if **all** hold: + - the in-use version still matches the brief's `researched_against_version`, AND + - `latest_available` is **not greater than** the brief's `researched_against_version`, AND + - the long fallback TTL (`fallback_ttl_days`, default 180) has **not** elapsed since `researched_on`. + +4. **Otherwise refresh** (live research, scoped to the gap past the shipped index's `covered_through`): + `WebSearch`/`WebFetch` for the framework + version's recent + performance release notes, changelogs, deprecations, and performance guides. Extract: superseded + patterns (old→new), new fast-path APIs (+ the version that introduced them), changed defaults, and + known perf regressions/fixes by version. Rewrite the cache file (with sources). The brief covers + the **in-use** version's characteristics *and* notes fast-paths a newer version would unlock (feeds + upgrade-opportunity findings). + +5. **Offline / no-network degrade.** Emit "currency brief unavailable (offline)". `idiom-currency` findings are + flagged **LOW confidence** and marked for manual currency check. **Never fabricate** version-specific + claims — absence of a brief is stated, not papered over. + +The consolidated audit report MUST record which brief (and its `researched_on` date) it used, so a +finding derived from a possibly-stale brief can be re-checked. + +## Registry commands per ecosystem + +| Ecosystem | Latest-version check (best-effort) | +|---|---| +| npm (Node/JS/TS) | `npm view <pkg> version` | +| PyPI (Python) | `pip index versions <pkg>` (or `pip install <pkg>==` and read the error) | +| NuGet (.NET) | `dotnet package search <pkg> --exact-match` or query `api.nuget.org` | +| Go modules | `go list -m -versions <module>` | +| crates.io (Rust) | `cargo search <crate>` or query `crates.io/api/v1/crates/<crate>` | +| Maven Central (JVM) | query `search.maven.org/solrsearch/select?q=...` | +| Swift | toolchain/language version drives most perf currency — check `swift --version` locally and swift.org releases; per-package versions are git tags (no central registry version command — `swift package` resolves from git; Swift Package Index for discovery) | + +These require network + the tool installed. All are best-effort per step 1. For Swift, the *language/toolchain* version (not a package registry) is the primary currency axis — the version index is keyed on Swift releases. + +## Cache file format + +Path: `docs/perf-audits/cache/<ecosystem>/<framework>@<major.minor>.md` + +```markdown +--- +schema_version: 1 +framework: <name> +ecosystem: <npm|pypi|nuget|go|crates|maven> +researched_against_version: <x.y.z in use at research time> +latest_known_at_research: <x.y.z latest available at research time> +researched_on: <YYYY-MM-DD> +fallback_ttl_days: 180 +sources: + - <url> + - <url> +--- + +## Superseded patterns (old → new) +- <pattern that regressed/deprecated> → <current recommended pattern> (changed in <version>) + +## New fast-path APIs (and the version that introduced them) +- <API/feature> — introduced <version> — <what it speeds up> + +## Changed defaults +- <setting> default changed in <version>: <old> → <new> — <perf implication> + +## Known perf regressions / fixes by version +- <version>: <regression or fix> — <impact> +``` + +`schema_version` lets the format evolve without misreading old caches; bump it if the structure changes. diff --git a/.claude/skills/performance-audit/feedback-template.md b/.claude/skills/performance-audit/feedback-template.md new file mode 100644 index 00000000..eefd28e4 --- /dev/null +++ b/.claude/skills/performance-audit/feedback-template.md @@ -0,0 +1,140 @@ +# Field-feedback template — `performance-audit` family + +**Purpose.** A hand-off-ready template + instructions to give an agent (or yourself) running +`performance-audit` / `performance-audit-cycle` against a real repo, so the experience produces +**high-signal, actionable feedback** the maintainers can fold back into the skill. The first real-world +run produced exactly this kind of doc; this template generalizes what made it useful. + +It is **loosely structured on purpose** — skip areas that didn't come up, expand the ones that did. The +goal is honest field notes, not a compliance form. + +--- + +## Instructions to the executing agent (read first) + +You are running the `performance-audit` (and/or `-cycle`) skill against a repository. **In addition** to +doing the audit, keep a **running feedback log** as you go and hand it back at the end. The most +valuable feedback comes from writing it *while* you hit friction, not reconstructing it afterward. + +What makes feedback high-quality (do these): + +1. **Tag every item** with the legend below so wins, friction, defects, and ideas are separable. +2. **Record workarounds you had to invent.** If the skill didn't tell you how to do something and you + improvised (a dispatch adaptation, a scoping heuristic, a missing mode), that improvisation is the + single highest-signal datapoint — it marks a real gap. Say what you did and why. +3. **Point at where a fix would go** when you can — the file/phase/section (e.g. "SKILL.md Phase 2", + "finding-model calibration", "the rust version index"). You don't need to be right; it helps triage. +4. **Distinguish a defect from a preference.** A 🐞 is "the skill told me to do X and X was wrong/ + impossible"; a 🟡 is "X was ambiguous or costly"; a 💡 is "X could be better." Don't inflate. +5. **Report both directions of error.** Note false positives (nits the lanes manufactured) **and** false + negatives / blind spots (real issues a lane missed, things the packs/indexes didn't ground). +6. **Be honest about what you couldn't verify** (no hardware, no load test, no network for currency, + harness exposed no reasoning-effort knob). "Couldn't confirm" is a finding, not a gap to paper over. +7. **Capture the environment** — it shapes what's possible (see the context header). A friction that's + really "my harness can't do X" should be labelled as such, not as a skill defect. + +Two methodology asks that make the *audit itself* a better test of the skill: + +- **Run the lanes blind** where you can — give them load/scope context, **not** the findings you already + suspect. Then report whether they *discovered* the hot paths or merely confirmed a prior. (Discovery + is the real signal; the skill is built for it.) +- **Stress the anti-padding discipline on purpose** at least once: point a run at low-value / cold / + glue code and report whether it honestly returned "no significant findings" / "confirmed cold" or + whether it manufactured nits to look productive. + +--- + +## Context header (fill this in once, top of your feedback doc) + +``` +Repo / project: <name + one-line what-it-is> +Scale: <approx production LOC; languages/ecosystems; mono- or single-package> +Stack highlights: <frameworks, runtimes, notable libraries> +Skill(s) + version: <performance-audit / -cycle; plugin_version or "vendored, version per source"> +Harness: <Claude Code web/CLI, Agent-tool dispatch, model + whether an effort knob exists> +Scope run: <bounded module / whole-repo via scoping method / a specific slice> +Depth: <full / reduced / cold-sweep / overlay; lanes run> +Blind run? <yes/no — were lanes given the answers or not?> +``` + +> Legend: 👍 worked well · 🟡 friction / ambiguity · 🐞 likely defect · 💡 suggestion. +> Within each area, newest note first is fine. One line of context per item minimum. + +--- + +## Areas to comment on (skip what didn't come up) + +**1. Setup, onboarding & dispatch harness.** Was skill discovery / invocation by name clean? If lanes +were dispatched as subagents, could they see the skill (read their own pack slice) — or did you adapt? +Could you set the dispatch model / reasoning effort, or did the harness expose no knob? `plugin_version` +findable? + +**2. Scope handling.** Was the bounded-scope guard helpful or in the way? For a whole-repo / oversized +goal, did `whole-repo-scoping.md` route you cleanly (size router → slices → tiers → review gate), or did +you have to invent partition logic? Were the LOC bands / depth tiers right for this ecosystem? Anything +the method didn't cover (a stack shape, a monorepo layout, a slicing call)? + +**3. Detection & pack loading (Phase 0).** Did stack/version detection pick the right packs + modules? +Did **materiality** keep irrelevant modules out (or load junk on an incidental import)? Did the right +sub-stack modules exist — and were any missing for this ecosystem? + +**4. Lane dispatch (Phase 2).** Which dispatch mode did you use (runner-pastes-slice vs lane-reads-own- +slice) and was it the right call at this lane count? Did every lane actually receive its lane-keyed +slice **+ the cross-cutting Runtime/Variant notes + the loaded modules**? Did the blind run discover, or +just confirm? + +**5. The lanes & profile packs (the heart of it).** Did the packs behave as a **reference, not a +checklist** — did any lane *out-reason the pack* and find something it didn't list (good), or did it +walk the pack and pad (bad)? Per lane, note misses (false negatives) and manufactured nits (false +positives). Did `idiom-currency` have a grounded version index / currency brief for this stack, or fall +back to model knowledge? Did the descriptive `cost-map` lane earn its keep (catch a framing error)? + +**6. Synthesis & finding model (Phase 3).** Did dedup + cross-lane agreement read as a confidence +signal? Did **calibration** hold — especially the anti-padding stress test, latent/dead code, +dev-only/external-process code, and bounded-`n`? Did a lane **correct the scope brief from source** (and +did the synthesis record it)? Did the **bug-no-chase** boundary hold (suspected bugs recorded, not +fixed) — including any **co-located** bug in a perf finding's function? Run metadata / regression diff / +`runs.jsonl` sane? + +**7. Cycle phases (if you ran `-cycle`).** Cross-validation, optional dynamic confirmation, present-to- +user loop, **fix-plan generation + plan-review** (did the review catch anything real?), and — for a +multi-slice run — the **whole-repo roll-up** (did it surface cross-slice themes a per-unit view +couldn't, and any `assume-hot` findings needing operator confirmation?). + +**8. Artifacts & ergonomics.** Did output paths exist / get created cleanly (`docs/perf-audits/`, +`runs.jsonl`, `cache/`)? Was the run **resumable** after a context reset (ledger/handoff sufficient)? +Commit cadence workable? Anything that errored on a first run? + +**9. Authoring (only if you extended the skill).** If you wrote new skill content (a method, a module, an +index entry), did you follow the reference discipline (descriptive self-contained titles, no opaque +`S#`/code cross-refs)? Note any convention that was easy to violate. + +**10. Top changes + verdict.** Your **top 3** concrete changes you'd make to the skill, ranked. One-line +**overall verdict**: did it find real, actionable, well-calibrated performance work on this repo? + +--- + +## Minimal quick version (for a small / lightweight run) + +If a full doc is overkill, hand back just this: + +``` +Context: <repo / scale / stack / harness / scope+depth / blind?> +👍 What worked (2–4 bullets): +🟡 Friction / what I had to improvise (2–4 bullets — workarounds are gold): +🐞 Defects (skill said X, X was wrong/impossible): +💡 Top 3 changes I'd make, ranked: +Verdict (1 line): did it find real, well-calibrated perf work? +``` + +--- + +## What "high-quality" looked like (one real example) + +The first field run (a ~96k-LOC Rust+TS app) was valuable because it: ran lanes **blind** and reported +that they *reproduced a 5-round review's hot-path map and added findings it missed*; **stress-tested +anti-padding** on the cold tail and reported the lanes honestly returned "no significant findings" +rather than nits; recorded every **workaround it invented** (a lane-reads-its-own-pack dispatch +adaptation; a whole-repo partition method) — each of which became a skill change; and flagged where +grounding was thin (the version index lacked the DSP/React library APIs it needed). Aim for that: +**blind discovery, honest non-findings, named workarounds, and concrete where-it-would-change pointers.** diff --git a/.claude/skills/performance-audit/finding-model.md b/.claude/skills/performance-audit/finding-model.md new file mode 100644 index 00000000..d7383417 --- /dev/null +++ b/.claude/skills/performance-audit/finding-model.md @@ -0,0 +1,176 @@ +# Performance Finding Model + +**Load this when:** generating, ranking, validating, or planning fixes for performance findings. +This file defines how a performance finding is scored, what is *not* a finding, and the +disposition discipline that governs the remediation plan. + +## Contents +- The four axes (Impact, Confidence, Effort, Verification plan) +- Prioritization rule +- Calibration — what is NOT a finding +- No severity-based deferral (disposition discipline) +- Rationalization table + red flags + +--- + +## The four axes + +Every finding carries all four. + +### Impact = reachability × frequency × per-occurrence cost + +Impact is **expected aggregate cost**, not locality or raw ugliness. + +- **Reachability** — is it on a request path / inner loop / render path / startup? Code that never + runs under realistic load has ~zero impact regardless of how slow it is in isolation. +- **Frequency** — how often it runs (structurally: loop nesting, call-site count, per-item + callbacks over collections that grow with load). +- **Per-occurrence cost** — big-O class, allocations, I/O, CPU per execution. + +A big-O improvement on a provably bounded, small `n` reached once at startup is **low** impact. +A small constant-factor win on the hot path of every request is **high** impact. + +Rank: **Critical** (dominant aggregate cost / scaling wall) · **Major** (clear measurable drag) · +**Minor** (real but small aggregate cost). Severity ranks *order of attention*, never *inclusion* +(see disposition discipline). + +### Confidence + +`Measured` (a profile/benchmark confirms it) > `Strong-static` (the code structure makes it certain) +> `Heuristic` (plausible but unverified). Framework-idiom-currency findings inherit the currency +brief's freshness; **offline ⇒ Low**. + +### Effort = work magnitude ONLY + +Describe the size of the change using exactly these buckets: + +- **Localized** — one function. +- **Contained** — one module + its callers. +- **Cross-cutting** — a signature/abstraction change rippling across packages. + +You MAY add "low-effort" / "high-effort". + +**BANNED vocabulary:** any wall-clock or calendar unit — hours, days, weeks, sprints, +story-points-as-time — and any time-flavored adjective ("quick", "a quick afternoon", "trivial +timewise") used as a basis for sizing or deferral. Time estimates anchor on human calendar-time +training data and are unreliable for an agent; a fabricated duration becomes a stale anchor that +misleads readers. State *what changes and how widely*, not *how long it takes*. + +### Verification plan + +How to prove the fix helps **and** preserves behavior: + +- The **benchmark/profile to run**, OR an explicit **complexity/allocation argument** when + measurement isn't feasible; AND +- A **correctness guard** — a test that pins the behavior the optimization must not change. + +--- + +## Referring to findings (persistent-artifact reference discipline) + +This is the project's standard **persistent-artifact reference discipline** applied to audit +findings — the canonical rule lives in the `claude-agents-md-init` skill's template under +"Cross-references in persistent artifacts" (opaque working-session shorthand like `Option C` / +`Decision F1` — and here `Lane 4` / `P3` — MUST NOT leak into anything that persists outside the +conversation). It distinguishes two cases that apply directly here: + +- **Lane names/numbers are *opaque session identifiers*** — they have no anchor anywhere outside this + skill, so a bare "Lane 4" is a missing legend. Replace it with the plain-English meaning: use the + lane slug at minimum (`concurrency`), and in prose describe the finding itself. +- **Finding IDs (`P1`, fingerprints) are *bare references to a real artifact*** — they do anchor (to + the consolidated report), so they MAY stay, but only as a traceability suffix beside a + self-identifying description, never on their own. + +The operational test (from that template): reading only the inline text, with no link- or +report-chasing, can the reader recognize what the reference points at and decide whether it matters? + +- **MUST NOT** use a bare lane name/number or finding ID as the *sole* referent in any persistent or + outward-facing artifact: commit messages, PR titles/bodies, code comments, remediation-plan task + titles, or questions to the user. +- **MUST** describe the finding in self-contained, human-meaningful terms (what, where, why) wherever + it is referenced outside the report. The ID may be appended as a *traceability suffix*, never used + as the whole reference. + +| Don't | Do | +|-------|-----| +| `fix: address Lane 4 finding` | `perf: run independent widget fetches concurrently in load_dashboard (was serial awaits) [perf finding P5]` | +| `// resolves P3` | `// one batched fetch — the per-item loop here was an N+1 (perf audit P3)` | +| "Should I fix the data-access lane issue?" | "Should I fix the N+1 in enrich_line_items (one DB round-trip per line item)?" | + +The report itself may use lane names and IDs as section structure, but every finding leads with a +descriptive title — so even the report reads correctly without prior context. Lane names (the slugs +above) are always preferable to lane numbers; never write "Lane 4" in prose a human will read. + +## Prioritization rule + +Order findings by **Impact × Confidence**. Use **Effort** to *sequence* within that band — surface +high-impact / high-confidence / low-effort items first ("quick wins"), and high-impact / +high-effort items as deliberate investments. Effort sequences; it never removes a finding. + +--- + +## Calibration — what is NOT a finding + +Do not manufacture these. Reporting them pads the audit and erodes trust: + +- Cold-path micro-optimizations with no argued or measured aggregate impact. +- Readability-destroying optimizations for an unmeasured gain. +- Style / idiom preferences with no performance consequence (that's `project-health-review`'s lane). +- Theoretical big-O improvements on a provably bounded, small `n`. +- Hypothetical scaling concerns far beyond plausible load (note as a design remark only if reachable). +- Correctness bugs — those belong to `bug-hunt-cycle`. Record them in the report's Suspected Bugs + appendix; do not chase them unless the incorrect behavior *is* the performance problem. + - **Co-located bug** (a suspected bug in the *same symbol* as a real perf finding — e.g. an + off-by-one in the function whose hot loop you're flagging): the boundary still holds — record it + in the Suspected Bugs appendix, do **not** fix it in the audit. You MAY note that the perf fix's + eventual task will touch the same code (so the bug should be resolved alongside it), but the audit + *records*, never *fixes*. Don't let proximity to a perf finding pull a correctness bug across the + "audit records bugs, never chases them" line. + +**Calibration governs generation, not post-hoc suppression.** It tells a lane agent what not to +*manufacture*. Once a finding has been surfaced, it MUST NOT be silently dropped as "too minor" — +that decision belongs to the user (see below). Never cite calibration during validation to discard +a real finding. + +--- + +## No severity-based deferral (disposition discipline) + +**Every finding's default disposition is FIX.** The remediation plan MUST schedule **all** findings +by default. Low / minor / moderate impact is **NOT** grounds for deferral — a batch of cheap fixes +is cheap to do, and "defer the minors" leaves them deferred to no one, forever. + +A finding may be dropped from the plan only when **one** of these holds: + +1. The **human reviewer explicitly opts it out**, or +2. The agent states a **substantive, non-severity, non-effort reason that names a specific concrete + mechanism**: + - the exact in-flight refactor it collides with, and where; or + - the exact dependency major-bump it requires, and why that is out of scope; or + - the specific correctness regression it risks, and why that risk outweighs the gain. + +A *vague* gesture — "might be risky", "could be complex", "better to wait", "low priority" — does +**NOT** qualify and is treated as a banned severity/effort deferral. The agent MAY *recommend* a +deferral that meets bar (2); it MUST NOT *self-authorize* deferral on severity or effort grounds. +Deferred items (with their named mechanism or the reviewer's opt-out) go in the plan's Deferred +appendix — the persistent record, never left in conversation memory. + +### Rationalization table + +| Excuse | Reality | +|--------|---------| +| "These are low-severity, I'll list them as future improvements" | Future for whom, when? Cheap fixes are cheap. Put them in the plan as tasks. | +| "Deferring minors keeps the plan focused" | The plan addresses all findings by default. Focus is the reviewer's call, not yours. | +| "A batch of small fixes isn't worth a task" | Group them into one task. Grouping ≠ dropping. | +| "Low impact = not worth fixing" | Impact ranks order, not inclusion. Only the reviewer or a substantive named mechanism removes a finding. | +| "Defer — this might be risky / could be complex" | Name the *specific* mechanism (which refactor, which dependency, which regression + why) or it's a disguised severity/effort deferral. | +| "I'll estimate this is a 2-hour fix so defer it" | Wall-clock is banned and effort is not a deferral ground. State work magnitude; schedule it. | + +### Red flags — STOP + +- "Defer the minors" / "low priority so later" / "nice-to-have, skip for now". +- Any deferral whose only basis is severity or effort. +- Any effort expressed in hours/days/sprints. +- Dropping a surfaced finding during validation by calling it "below the bar". + +All of these mean: schedule the finding, or produce a reviewer opt-out / a named substantive mechanism. diff --git a/.claude/skills/performance-audit/lane-prompts.md b/.claude/skills/performance-audit/lane-prompts.md new file mode 100644 index 00000000..fee9f5eb --- /dev/null +++ b/.claude/skills/performance-audit/lane-prompts.md @@ -0,0 +1,241 @@ +# Lane Prompts + +**Load this when:** dispatching Phase 2 of `performance-audit`. The runner pastes the **shared +preamble** + the relevant **lane body** into each lane agent, filling the `[...]` placeholders. +These prompts live here (not in `SKILL.md`) to keep the SKILL body within budget. + +## Contents +- Shared per-agent preamble (all lanes) +- Algorithmic complexity & data structures (lane `algorithmic`) +- Memory & allocation (lane `memory`) +- Data access & I/O (lane `data-access`) +- Concurrency & parallelization (lane `concurrency`) +- Framework-idiom currency (lane `idiom-currency`) +- Execution Cost Map (lane `cost-map`) — produces a MAP, not findings +- Payload / startup / build (lane `payload-startup`, conditional) +- Dynamic profiling & benchmarking (lane `dynamic`, optional) + +--- + +## Shared per-agent preamble (all lanes) + +``` +You are a performance auditor for ONE dimension. Find performance problems in +your dimension; do not praise, do not summarize, do not grade. + +Stack profile: [paste detected ecosystem/framework/version] +Profile-pack lens for your lane: [paste the relevant lane slice from the matched profile pack(s), PLUS the core pack's cross-cutting Runtime/Variant-notes section — dotnet `Variant notes`; go/python/js-ts/rust `Runtime …notes`; and a companion pack's equivalent (SQL `Reading the plan & schema`, HTML `Rendering path & Core Web Vitals`) — as shared ecosystem context that applies to every lane] +Currency brief (version-specific guidance): [paste brief, or "unavailable — offline"] +Scope: [paste files/area] +Output file: docs/perf-audits/<date>-<slug>-<lane>.md + +Read ACTUAL source code, not just CLAUDE.md / AGENTS.md. Cite file:line for +code-level findings; cite 2-3 representative examples for pattern-level findings. + +THE PROFILE-PACK LENS IS A REFERENCE, NOT A CHECKLIST. It names durable footguns +worth attention in this ecosystem so you recognize patterns faster — it is a +PRIOR, not a worklist, and a FLOOR, not a ceiling. Your own reading of the actual +code is primary. Do NOT walk it item by item; do NOT report an item merely +because the pack lists it; do NOT treat "this pack bullet's absence" as a finding; +and never limit your investigation to what the pack names. Finding something real +the lens didn't list is exactly the goal — the lens encodes what's known to be +worth knowing, not the boundary of what's worth finding. If you are a stronger +model than the lens was written for, out-reason it. + +CALIBRATION — what is NOT a finding (do NOT report these): +- Cold-path micro-optimizations with no argued or measured aggregate impact +- Readability-destroying optimizations for an unmeasured gain +- Style/idiom preferences with no performance consequence +- Theoretical big-O improvements on a provably bounded, small n +- Hypothetical scaling concerns far beyond plausible load (note as a design + remark, not a finding, only if reachable) +- Correctness bugs — DO NOT chase them. If you notice one, record it in the + "Suspected Bugs (for follow-up)" section of your report (file:line, what + looks wrong, why) and move on. Recording is mandatory; chasing is forbidden. + A bug counts as "the performance problem" (in-scope to pursue) ONLY when the + incorrect behavior IS the slowness — e.g., a cache key bug that makes every + lookup miss, or a condition that triggers a retry storm. "This bug is near + slow code" does NOT qualify; record and move on. + +FINDING MODEL (see finding-model.md): +- Impact = reachability × frequency × per-occurrence cost. Rank CRITICAL / + MAJOR / MINOR by expected aggregate cost, not locality. +- Confidence = Measured | Strong-static | Heuristic. +- Effort = work MAGNITUDE ONLY, one of: Localized (one function) / Contained + (one module + callers) / Cross-cutting (signature/abstraction change across + packages). You MAY add low-effort/high-effort. BANNED: any wall-clock or + calendar unit (hours, days, weeks, sprints, story-points-as-time) and any + time-flavored adjective. Time estimates anchor on human training data and + are unreliable for an agent. + +Finding format: +### [CRITICAL|MAJOR|MINOR impact] <title> +**Location:** <file:line or pattern> +**Problem:** <what's slow and why> +**Impact:** <reachability + frequency + per-occurrence cost: big-O class, +allocs/iter, queries/request, or measured ms> +**Confidence:** <Measured | Strong-static | Heuristic> +**Effort (work magnitude, NOT time):** <Localized | Contained | Cross-cutting> + why +**Verification plan:** <benchmark/profile to run OR complexity/allocation +argument> + <correctness guard: the test that pins unchanged behavior> + +NAMING: lead every finding with a self-contained descriptive title (what / where +/ why). Refer to lanes by name (e.g. the `data-access` lane), never "Lane 3". +Do not use a bare lane name or finding ID as the sole referent in any text that +leaves this audit (commit messages, PR text, code comments) — see +finding-model.md "Referring to findings". + +Write your full report to the output file AND return your findings in your +response for consolidation. End the report with a "Suspected Bugs +(for follow-up)" section (or "None"). +``` + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) + +``` +[shared preamble] + +Your dimension: algorithmic complexity and data-structure choice. Look for: +accidental quadratics (nested scans over inputs that grow with load), repeated +or recomputed work inside loops that could be hoisted or memoized, the wrong +container for the access pattern (linear scan where a hash/set fits), and +recomputation of pure results that could be cached. Estimate the input sizes +that reach this code under realistic load — a quadratic over a bounded handful +is not a finding; a quadratic over request-sized or dataset-sized input is. +``` + +## Memory & allocation (lane `memory`) + +``` +[shared preamble] + +Your dimension: memory and allocation. Look for: allocation on hot paths, +large intermediate collections built and immediately discarded, copies where a +view/slice/borrow would do, unbounded growth (caches without eviction, +accumulating buffers, retained references), and reading whole resources into +memory where streaming would bound peak usage. Use the profile-pack lens for +this ecosystem's specific allocation footguns. +``` + +## Data access & I/O (lane `data-access`) + +``` +[shared preamble] + +Your dimension: data access and I/O. Look for: N+1 access (one query/request +per item in a loop vs one batched call), missing pagination/batching, +over-fetching, synchronous/blocking I/O on hot or latency-sensitive paths, +chatty round-trips that could be coalesced, missing connection pooling, +serialization overhead, missing or misused caching, and query shapes implying a +missing index. Express impact as queries/requests per operation where you can. +``` + +## Concurrency & parallelization (lane `concurrency`) + +``` +[shared preamble] + +Your dimension: concurrency, run BOTH directions. +(a) EXPLOIT — find serial work over independent items, sequential awaits on +independent async operations that could run concurrently, and missing +pipelining/streaming. BEFORE suggesting parallelization you MUST verify the +work is actually independent (no shared mutable state, no ordering or data +dependency) and attach a correctness guard to the finding. A parallelization +suggestion that introduces a race is a regression, not a fix. +(b) DEFEND — find lock contention, critical sections larger than necessary, +blocking calls inside async contexts, false sharing, and pool exhaustion. +``` + +## Framework-idiom currency (lane `idiom-currency`) + +``` +[shared preamble] + +Your dimension: framework-idiom currency. Consult, in order: (1) the shipped +version index for this ecosystem (version-indexes/<ecosystem>.md, provided +above if it exists) — a build-once "API/feature → version → perf benefit" +lookup; then (2) the currency brief above (recency beyond the index). +Flag: patterns the index/brief mark superseded/deprecated that the code still +uses; fast-path APIs/types they list that the code does NOT use (e.g. the code +uses the slow path the index says was superseded as of version X); changed +defaults the code still fights. Cite the index entry or brief line per finding; +Confidence inherits its freshness. If neither is available, report candidate +idiom concerns at LOW confidence flagged for manual currency check, and do NOT +fabricate version-specific claims. +SUPPORT-TRACK RULE: when a fast-path requires upgrading the framework/runtime, +qualify the recommendation by the project's SUPPORT TRACK. Ecosystems with an +LTS cadence — .NET (even major = LTS, odd = STS), Java (LTS releases only), Node +(even major = LTS) — make "upgrade to the latest major" frequently invalid: a +project on an LTS line cannot adopt an STS-only feature without leaving support. +Recommend the best option available *on the project's LTS line*, or surface the +upgrade as a deliberate support-track tradeoff (not an unconditional "just +upgrade"). The index's "Support cadence" section states each ecosystem's tracks. +``` + +## Execution Cost Map (lane `cost-map`) — produces a MAP, not a findings list + +``` +[shared preamble — EXCEPT you are EXEMPT from "report only problems". This lane +is DESCRIPTIVE. Do NOT manufacture problems; some hot regions are inherent and +fine. You do NOT use the finding format; use the map format below.] + +Your job: produce a MAP of where this program most plausibly concentrates time, +for architectural awareness — usable by a human or agent to rethink design or +seed internal "known bottlenecks" docs. Reason about two multiplied dimensions: +- FREQUENCY: small/cheap functions on hot paths (request/render handlers, inner + loops, per-item callbacks, serializers, hashing/equality, logging) that add up. +- UNIT COST: heavy functions (large scans, parsing, crypto, layout, regex + compilation, big allocations) regardless of frequency. + +REASON FROM STRUCTURAL SIGNALS, NOT INVENTED NUMBERS. You cannot know runtime +call counts statically. Build the map from observable structure: loop nesting, +call-site count, recursion, fan-out, per-item callbacks over collections that +grow with load, membership on a request/render/startup path. Label each region +with its BASIS and a CONFIDENCE (High/Medium/Low). These are HYPOTHESES about +hot regions, not measured fact; where dynamic profiling ran, its measurements +supersede your guesses. + +Output (write to the output file and return it): +## Execution Cost Map +> Architectural awareness, NOT an optimization to-do list. Not every region +> here is a problem; some are inherent and fine. + +### Likely time-concentration regions +- **<region/component>** — basis: <structural reasoning> — confidence: + <High|Medium|Low> — <map-only | also flagged by the `<lane-id>` lane> + +### Notes for architecture +- <observations that might suggest a different approach, if any> +``` + +## Payload / startup / build (lane `payload-startup`, conditional) + +``` +[shared preamble] + +Your dimension: payload, startup, and build cost. (Run only when the stack has +such a surface — frontend, serverless, CLI, mobile.) Look for: shipping more +than needed to the consumer (large payloads, unused data, no compression), +expensive work at startup/cold-start that could be lazy or cached, eager +initialization of rarely-used components, bundle size, tree-shaking, and +code-splitting/lazy-loading opportunities. Use the profile-pack lens. +``` + +## Dynamic profiling & benchmarking (lane `dynamic`, optional) + +``` +[shared preamble] + +Your dimension: MEASURED performance. Activate ONLY when (a) the environment can +build and run the project AND (b) a real workload exists (an existing +benchmark/load test/representative entry point) or one can be DEFENSIBLY +constructed from real usage. You MUST NOT invent a workload or fabricate +numbers — a meaningless micro-benchmark is worse than none. If you cannot run +honestly, write "Dynamic lane not run: <reason>" and stop. + +When you can run: capture a profile with the stack's native tooling under the +real workload, report measured hotspots (Confidence = Measured), and explicitly +validate or refute the static lanes' findings where they overlap your measurements. +``` diff --git a/.claude/skills/performance-audit/profile-packs/dotnet.md b/.claude/skills/performance-audit/profile-packs/dotnet.md new file mode 100644 index 00000000..8e980d1a --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet.md @@ -0,0 +1,312 @@ +# Profile Pack: .NET + +Covers two distinct variants with different performance models: **Modern .NET** (detected by TFM +`net8.0`+ or `netcoreapp*` in `.csproj` / `<PackageReference>`-based restore) and **.NET Framework** +(detected by TFM `net4x` and/or `packages.config`-based restore). + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) +- LINQ chains that enumerate a sequence multiple times (e.g., calling `.Count()` then iterating); + materialise with `.ToList()`/`.ToArray()` once when re-use is needed. +- `.Contains` on `List<T>` inside loops — O(n) per call yields O(n²) overall; replace with + `HashSet<T>` or `FrozenSet<T>` for read-heavy lookup sets (verify against the currency brief for + your version). +- Repeated/recomputed LINQ projections or sort keys inside loops that could be hoisted. +- Nested loops over entity collections loaded from a database — accidental O(n²) better solved at + the query layer. +- `Dictionary<K,V>` used for a collection that is built once then queried many times: prefer + `FrozenDictionary<K,V>` / `FrozenSet<T>` for lower lookup overhead and better cache locality + (verify against the currency brief for your version). +- `PriorityQueue<TElement, TPriority>` for any "next cheapest item" pattern rather than + sorted lists with O(n log n) re-sort on every insert (verify against the currency brief for your + version). +- Culture-aware string comparison/search where ordinal would do: `==`/`.Equals`/`.IndexOf`/`.Contains` + /`.StartsWith` default to **culture-sensitive** collation (slower, allocates, and locale-dependent) + — pass `StringComparison.Ordinal`/`OrdinalIgnoreCase` for identifiers, keys, and lookups; use + `StringComparer.Ordinal[IgnoreCase]` for `Dictionary`/`HashSet`/sorts; and avoid + `ToUpper()/ToLower()` purely to compare (allocates a throwaway string per call — compare with + `OrdinalIgnoreCase` instead). + +## Memory & allocation (lane `memory`) +- LINQ on hot paths allocates iterators and delegates; prefer `for`/`foreach` with early exit, or + array-based tight loops for throughput-critical code. +- Boxing of value types (`struct` passed as `object`, stored in non-generic collection, used as + `IComparable`/`IEquatable` without constraints). +- Large Object Heap (LOH) pressure: arrays or strings over ~85 KB allocated and discarded + frequently; prefer `ArrayPool<T>.Shared.Rent`/`Return` to pool buffers and + `Microsoft.Extensions.ObjectPool.ObjectPool<T>` for heavier objects (verify against the currency + brief for your version). +- `string` concatenation in loops — use `StringBuilder`, `string.Join`, or interpolated string + handlers (modern .NET); raw interpolation still allocates on every call in tight loops. +- `Span<T>` / `Memory<T>` / `ReadOnlySpan<T>` / `stackalloc` opportunities to slice or work with + buffers without heap allocation or copying (modern .NET; verify against the currency brief for + your version). +- Collection expressions (`[x, y, z]` syntax) let the compiler choose stack- or inline-array- + backed storage rather than a heap allocation — prefer over explicit `new List<T> { … }` where the + declared type allows it (verify against the currency brief for your version). +- Inline arrays (`[InlineArray(N)]` structs) provide fixed-size stack storage exposed as + `Span<T>`; used internally by the runtime and useful in hot-path structs (verify against the + currency brief for your version). + +## Data access & I/O (lane `data-access`) +- EF Core N+1: navigating a collection property inside a loop instead of using `.Include()` + (eager loading) or a projection query; lazy loading makes this easy to trigger accidentally + (verify against the currency brief for your version). +- Per-row saves in loops — use `ExecuteUpdate`/`ExecuteDelete` for bulk server-side mutations + without loading entities into memory; prefer `SaveChanges` batching over per-entity calls + (verify against the currency brief for your version). +- Missing `AsNoTracking()` on read-only queries; the change-tracker allocates and retains entity + snapshots unnecessarily — use `AsNoTrackingWithIdentityResolution()` when de-duplication of + related entities is still needed (verify against the currency brief for your version). +- Over-fetching: full entity materialisation when only a few columns are needed; use projections + (`.Select()`) to pull only what is used. +- Cartesian explosion from multi-level `Include` — use `AsSplitQuery()` to issue separate SQL + statements and avoid row multiplication (verify against the currency brief for your version). +- Hot LINQ-to-EF queries executed repeatedly with identical shapes: pre-compile with + `EF.CompileQuery` / `EF.CompileAsyncQuery` to amortise the LINQ-to-SQL translation cost + (verify against the currency brief for your version). +- Synchronous database calls on async paths; missing connection-pool reuse. +- Offset-based pagination (`Skip(n).Take(m)`) on large tables scans n rows on the DB; prefer + keyset/cursor pagination for production data volumes. + +## Concurrency & parallelization (lane `concurrency`) +- **Sync-over-async:** calling `.Result` or `.Wait()` on a `Task` blocks a thread-pool thread and + causes deadlocks in contexts with a synchronisation context (classic ASP.NET / WinForms). +- Missing `ConfigureAwait(false)` in library code risks deadlock when consumed by a caller with a + synchronisation context (particularly .NET Framework; verify against the currency brief for your + version). +- Sequential `await` over independent async operations — use `Task.WhenAll` to run concurrently + (verify correctness: no shared mutable state, no ordering dependency). +- Thread-pool starvation: long-running synchronous work on pool threads, or too many concurrent + blocking calls; consider `Task.Run` with explicit sizing or dedicated threads. +- Lock contention from coarse-grained `lock` blocks; consider `SemaphoreSlim`, + `ReaderWriterLockSlim`, or lock-free structures for read-heavy paths (verify against the + currency brief for your version). +- `ValueTask` avoids allocations on the common synchronous-completion path; misuse (awaiting + twice, storing in collections, not checking `IsCompleted` before awaiting) is a correctness and + perf hazard (verify against the currency brief for your version). + +## Framework-idiom currency (lane `idiom-currency`) +- Consult the currency brief. Key candidates: source-generated `System.Text.Json` vs reflection- + based serialisation; EF Core query pipeline version and available bulk-op APIs; Regex source + generator vs `new Regex(…)`; `SearchValues<T>` for multi-char search; `HttpClient` lifecycle + (`IHttpClientFactory`); `Parallel.ForEachAsync` for async fan-out work (verify against the + currency brief for your version). +- Offline (no brief): note candidate idiom concerns at LOW confidence, flagged for manual currency + check. +- **.NET LTS/STS cadence — support-track constraint:** .NET even-numbered majors are LTS (3-year + support); odd-numbered are STS (18-month). The current LTS is .NET 10 (Nov 2025). When + recommending a feature that first shipped in an STS release, explicitly flag that adopting it + requires the project to accept STS support terms — enterprise and regulated environments are + typically pinned to the LTS track and cannot act on STS-only features. Always prefer the latest + feature available on the project's LTS line. See the **Support cadence** section of the version + index (`version-indexes/dotnet.md`) for the current LTS/STS table. + +## Payload / startup / build (lane `payload-startup`, conditional) +- Cold-start cost: static constructors, eager DI registration of expensive services, large assembly + loads at startup — consider lazy initialisation or background warm-up. +- AOT compilation and trimming can eliminate JIT overhead but require annotation discipline; + reflection-heavy code silently breaks under trimming — `JsonSerializerIsReflectionEnabledByDefault` + set to `false` forces early detection of missing source-gen coverage (modern .NET; verify against + the currency brief for your version). +- `ReadyToRun` (R2R) pre-compiles assemblies to reduce first-JIT latency; combined with tiered PGO + it enables re-optimisation based on runtime profiles (modern .NET; verify against the currency + brief for your version). +- Publishing self-contained vs framework-dependent affects payload size and update surface. +- Unused NuGet package references pulled into the output; dead code that trimming could remove. + +--- + +## Variant notes + +### Modern .NET (8+/Core) +- Prefer source-generated JSON serialisation (`[JsonSerializable]` on a `partial JsonSerializerContext` + subclass) over reflection-based `JsonSerializer` defaults — eliminates runtime reflection, + reduces startup overhead, and is required for Native AOT (verify against the currency brief for + your version). +- `Regex.GeneratedRegex` source generator compiles patterns at build time; prefer it over + `new Regex(…)` or static `Regex` fields with `RegexOptions.Compiled` on hot paths (verify + against the currency brief for your version). +- `SearchValues<T>` pre-computes search state for repeated `IndexOfAny`/`ContainsAny` operations + across `string` or `Span<char>`; look for inline char-set arguments in search calls that could + be promoted to a cached `SearchValues<char>` or `SearchValues<string>` (verify against the + currency brief for your version). +- `Vector<T>`, `Vector128<T>`, `Vector256<T>`, `Vector512<T>` and hardware intrinsics (via + `System.Runtime.Intrinsics`) enable explicit SIMD; the JIT also auto-vectorises loops over + `Span<T>` when conditions allow — avoid branching and non-unit strides that defeat vectorisation. +- `TensorPrimitives` provides SIMD-backed bulk numerical operations (add, multiply, dot-product, + etc.) over spans; prefer it over manual loops for numeric workloads (verify against the currency + brief for your version). +- `IHttpClientFactory`-managed `HttpClient` instances recycle handlers correctly; a single long- + lived manually-created `HttpClient` can exhaust sockets or hold stale DNS. +- Native AOT / ReadyToRun / tiered PGO / GC-mode options affect startup vs throughput trade-offs, + and their defaults shift between versions (see the version index) — check that project publish + settings are intentional (verify against the currency brief for your version). + +### .NET Framework (4.x) + +> High-value focus: large 4.8 codebases that grew from the 3.5/4/4.5 era. Many of these are +> *conditions to look for* in legacy code where an in-Framework upgrade (no platform migration) +> unlocks a real win. Cross-reference the **`.NET Framework (4.x timeline)`** area of the version +> index for "available since 4.Y" facts. + +#### Runtime & GC configuration +- **Workstation GC running on a multi-core server**: Workstation GC is the default for standalone + (non-hosted) apps — Server GC is **NOT** the default for non-ASP.NET processes. On a multi-core + server, enabling Server GC (`<runtime><gcServer enabled="true"/>`) gives a per-CPU heap + dedicated + collection threads and dramatically cuts pause time / raises throughput for allocation-heavy + services; pair with background/concurrent GC (`<gcConcurrent enabled="true"/>`, the default). + Caveat: don't enable Server GC on machines running many app instances — they contend (verify + against the currency brief for your version). +- **LOH fragmentation with no compaction**: apps that churn large transient buffers/arrays (>85 KB) + fragment the Large Object Heap, which is swept-not-compacted by default; set + `GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce` (4.5.1+) + before a full blocking `GC.Collect()` at a quiet point to reclaim fragmentation (verify against + the currency brief for your version). +- **Quirks / 4.0 compatibility mode on a 4.8 app**: a big overlooked one — an app upgraded to run on + 4.8 but still *targeting* an older framework (no `<httpRuntime targetFramework="4.8"/>` in + web.config for ASP.NET, or an old `TargetFrameworkAttribute`/build target) runs in older-version + *quirks* compatibility mode and silently misses runtime/perf improvements. Confirm the app both + runs on **and targets** 4.8 (verify against the currency brief for your version). +- **Legacy 64-bit JIT instead of RyuJIT**: RyuJIT is the default 64-bit JIT since **4.6** (x64); + check no `<useLegacyJit enabled="1"/>` (or `COMPLUS_useLegacyJit=1` env/registry) is forcing the + slower legacy x64 JIT. Also `<gcAllowVeryLargeObjects enabled="true"/>` (4.5+) is required for + arrays >2 GB on 64-bit (verify against the currency brief for your version). + +#### Memory & allocation +- **Non-generic collections that box value types**: `ArrayList` / `Hashtable` / `Queue` / `Stack` + (non-generic) box every value-type element and lose type safety — migrate to `List<T>` / + `Dictionary<K,V>` / `Queue<T>` to eliminate boxing allocations and per-access casts. +- **`Span<T>` / `Memory<T>` via the `System.Memory` NuGet backport** (4.5+): slice arrays/strings + without copying. This is the portable "slow span" — real and useful, but **without the runtime + fast-path intrinsics** of Core, and ref-struct language features need **C# 7.2+**. Pair with + `System.Buffers` (`ArrayPool<T>.Shared`, 4.5.1+ NuGet) to pool temporary buffers and + `System.Threading.Tasks.Extensions` (`ValueTask`, NuGet) on hot async paths (mark all three as + NuGet backports; verify against the currency brief for your version). +- **`DataSet` / `DataTable` for large reads**: heavy per-cell `object` boxing and bookkeeping + overhead vs streaming a `DataReader` or projecting straight to POCOs; prefer the reader/POCO path + for large result sets and one-way reads. +- **LOH churn from large `MemoryStream`s and unsized `StringBuilder`**: repeatedly allocating large + `MemoryStream` buffers thrashes the LOH — use `Microsoft.IO.RecyclableMemoryStream` (NuGet) to + pool them; preallocate `StringBuilder` capacity when the final size is known; review + `string.Intern` misuse (interned strings are never collected). + +#### Networking & I/O +- **`ServicePointManager.DefaultConnectionLimit` left at 2**: defaults to **2 connections per host** + in non-web apps (10 for ASP.NET-hosted) — a classic outbound-HTTP throughput killer; raise it + early at AppDomain load for services that fan out to a downstream host (verify against the + currency brief for your version). +- **Nagle + Expect100Continue latency on small requests**: `ServicePointManager.UseNagleAlgorithm` + and `Expect100Continue` are **on by default** and add latency to small/chatty requests — disable + both for low-latency outbound calls. +- **`HttpClient` lifecycle**: a `new HttpClient()` per request exhausts sockets (TIME_WAIT); reuse a + single static/long-lived instance — **but** a long-lived `HttpClient` caches DNS, so set + `ServicePoint.ConnectionLeaseTimeout` (via `ServicePointManager.FindServicePoint`) to force + periodic connection recycling and pick up DNS changes (no `IHttpClientFactory` on Framework; + verify against the currency brief for your version). + +#### Async & threading +- **Pre-TAP async patterns**: code still using APM (`Begin*`/`End*`), `ThreadPool.QueueUserWorkItem`, + or raw `new Thread(...)` where `async`/`await` + TAP (**4.5+**) fits — migrate I/O-bound work to + async to free pool threads. +- **Sync-over-async deadlocks**: `.Result` / `.Wait()` / `.GetAwaiter().GetResult()` on a `Task` + blocks a pool thread and deadlocks under the ASP.NET / WinForms `SynchronizationContext`; add + `ConfigureAwait(false)` throughout library code (critical on Framework — the captured context is + the deadlock source). +- **Coarse locks & legacy lock types**: `ReaderWriterLock` (legacy) is slower and more error-prone + than `ReaderWriterLockSlim`; prefer `ReaderWriterLockSlim` / `SemaphoreSlim` for read-heavy paths. +- **ASP.NET thread-pool tuning for burst load**: under bursty load, default `minWorkerThreads` / + `minIoThreads` (`<processModel>` / `ThreadPool.SetMinThreads`) cause 500 ms thread-injection + stalls; tune them and `maxConcurrentRequestsPerCPU` (`aspnet.config`) for spiky workloads (verify + against the currency brief for your version). + +#### Data access (ADO.NET / EF6 / LINQ-to-SQL) +- **Buffering whole `DataSet`s instead of streaming**: prefer `DataReader` for forward-only reads; + add `CommandBehavior.SequentialAccess` for large BLOB/CLOB columns to stream them without buffering + the whole row. +- **Row-by-row inserts**: replace per-row `INSERT` loops with **`SqlBulkCopy`** for bulk load — orders + of magnitude faster for large batches. +- **EF6 / LINQ-to-SQL N+1 & tracking overhead**: lazy-loading a navigation property inside a loop + fires a SQL query per access — use eager `.Include()`; add `AsNoTracking()` (EF6) / + `MergeOption.NoTracking` (LINQ-to-SQL / ObjectContext) for read-only queries to skip change-tracker + snapshots; pre-compile hot query shapes with `CompiledQuery.Compile` (LINQ-to-SQL) — EF6 has an + automatic compiled-query cache but explicit compilation still helps complex queries. EF6 has **no** + `ExecuteUpdate`/`ExecuteDelete`; for bulk mutations use raw SQL (`Database.ExecuteSqlCommand`) or a + stored proc (verify against the currency brief for your version). +- **Connection-pool defeating patterns**: inconsistent connection strings spawn separate pools; + not disposing connections leaks them out of the pool — always `using`/`Dispose` `SqlConnection` + and keep connection strings byte-identical. + +#### Classic ASP.NET (WebForms / MVC5 / Web API 2) +- **`<compilation debug="true">` left on in production**: the classic, huge one — disables JIT + optimisations, disables request timeouts, bloats output, and prevents batched compilation; set + `debug="false"` and add `<deployment retail="true"/>` in machine.config on production servers to + force it regardless of per-app web.config. +- **ViewState bloat (WebForms)**: large serialized ViewState on every postback inflates payload — + disable ViewState on controls that don't need it (`EnableViewState="false"`) or use + `ViewStateMode`. +- **Missing output caching & bundling**: no `OutputCache` directive / `[OutputCache]` on cacheable + pages/actions re-executes expensive handlers; missing ASP.NET bundling+minification ships + unminified, unbundled JS/CSS. +- **Synchronous pages/controllers & `Response.Redirect` overuse**: blocking pages/actions where + async pages (`Page.RegisterAsyncTask`) / `async` MVC/Web API actions fit; `Server.Transfer` avoids + the extra client round-trip that `Response.Redirect` incurs for same-server transfers. + +#### CPU, reflection & serialization +- **`new Regex(...)` per call**: compile-once into a `static readonly Regex` (or use + `RegexOptions.Compiled` for hot, repeatedly-reused patterns — **not** for one-shot matches, where + compilation cost dominates) instead of constructing a `Regex` on every invocation. +- **Uncached reflection in mappers/serializers**: `Type.GetProperties()` / `MethodInfo.Invoke()` per + call in hand-rolled mappers is expensive — cache `MemberInfo`/`PropertyInfo` and prefer compiled + delegates (`Delegate.CreateDelegate` / expression trees) for hot property access. +- **Exceptions for control flow**: throwing/catching as normal flow is expensive on Framework (stack + walks); use `TryParse`/`TryGetValue`/return codes instead. Also `Enum.ToString()` and + `Enum.IsDefined` use reflection — cache results or avoid on hot paths. +- **`XmlSerializer` caching gotcha**: only `XmlSerializer(Type)` and `XmlSerializer(Type, String)` + cache the dynamically generated serialization assembly. Constructors taking `XmlAttributeOverrides` + / extra `Type[]` / `XmlRootAttribute` generate a **new temp assembly per instance that is never + unloaded** — a memory leak + perf cliff if constructed per call; cache these serializer instances + yourself (e.g., in a dictionary). +- **`BinaryFormatter` & per-call serializer settings**: avoid `BinaryFormatter` (slow and a known + RCE security risk — deprecated/removed in modern .NET); cache `JsonSerializerSettings` / + `DataContractSerializer` instances rather than allocating per call. Newtonsoft.Json is the typical + default serialiser; review payload-widening settings (`TypeNameHandling`, + `PreserveReferencesHandling`) (verify against the currency brief for your version). + +--- + +## Framework / sub-stack modules (load on detection) + +Load the core lanes + **Variant notes** above for *every* .NET project. Additionally load the matching +module file when its technology is detected in the audit scope, and include it as ecosystem context in +the relevant lane prompts. (These tech-specific lenses were split out of this pack so a run pastes only +what's relevant — see the version index `../version-indexes/dotnet.md` for version-specific facts.) + +| Detected (signals) | Load module | +|---|---| +| **ASP.NET Core (hosting & pipeline)** — `Microsoft.AspNetCore.*`, Web-SDK `.csproj`, `Program.cs`/`Startup.cs`, controllers/minimal APIs | [`dotnet/aspnet-core.md`](dotnet/aspnet-core.md) | +| **Blazor** — `*.razor`, `Microsoft.AspNetCore.Components.*` | [`dotnet/blazor.md`](dotnet/blazor.md) | +| **WCF (services)** — `System.ServiceModel`, `*.svc`, `[ServiceContract]`, `ChannelFactory` | [`dotnet/wcf.md`](dotnet/wcf.md) | +| **Data access — SQL Server (EF6 / EF Core / ADO.NET / Dapper)** — EF6/EF Core, `System.Data.SqlClient`/`Microsoft.Data.SqlClient`, Dapper, `*.edmx`, `DbContext` | [`dotnet/sql-server-data.md`](dotnet/sql-server-data.md) | +| **WinForms** — `System.Windows.Forms`, `*.Designer.cs`, `OutputType=WinExe` + `net*-windows` | [`dotnet/winforms.md`](dotnet/winforms.md) | +| **WPF** — `*.xaml`, `PresentationFramework`/`System.Windows`, `<UseWPF>` | [`dotnet/wpf.md`](dotnet/wpf.md) | +| **Caching** — `IMemoryCache`/`MemoryCache`/`HttpRuntime.Cache`, `StackExchange.Redis`/`IDistributedCache`, `HybridCache` | [`dotnet/caching.md`](dotnet/caching.md) | +| **Dependency injection (containers)** — `Microsoft.Extensions.DependencyInjection`, Autofac/Unity/Ninject/SimpleInjector/Castle Windsor | [`dotnet/dependency-injection.md`](dotnet/dependency-injection.md) | +| **Native / COM interop (incl. Office automation)** — `[DllImport]`/`[LibraryImport]`, `Microsoft.Office.Interop.*`, `[ComImport]`, `Marshal.`, `ComWrappers` | [`dotnet/interop.md`](dotnet/interop.md) | +| **Object mapping** — `AutoMapper`, `Riok.Mapperly`, `IMapper`, `.Map<`, `.ProjectTo<` | [`dotnet/object-mapping.md`](dotnet/object-mapping.md) | +| **Messaging & realtime** — `Microsoft.AspNetCore.SignalR`/`Microsoft.AspNet.SignalR`, `System.Messaging` (MSMQ), `Azure.Messaging.ServiceBus`, `RabbitMQ.Client` | [`dotnet/messaging-realtime.md`](dotnet/messaging-realtime.md) | + +## Sources + +Durable signals in this pack are grounded in these authoritative sources; **version-specific** facts +and their per-entry citations live in `../version-indexes/dotnet.md` (which carries a full `sources:` +frontmatter list). + +- **Runtime / BCL** — MS Learn .NET docs; devblogs.microsoft.com "Performance Improvements in .NET 6–10" (Stephen Toub); "What's new in .NET 8/9/10/11". +- **EF / data** — EF Core "Performance" docs (efficient querying/updating, tracking); EF6 "Performance Considerations for EF 4/5/6"; ADO.NET (connection pooling, `SqlBulkCopy`, `SqlDataReader`); SQL Server query-processing-architecture guide; Dapper README. +- **ASP.NET Core / Blazor** — release notes 6–10, performance best practices, output caching, Kestrel HTTP/3, Blazor virtualization & render-modes. +- **.NET Framework** — Workstation-vs-Server GC, `gcAllowVeryLargeObjects`, `LargeObjectHeapCompactionMode`, `<useLegacyJit>`, `ServicePointManager.DefaultConnectionLimit`, application-compatibility/quirks, `XmlSerializer` remarks, Framework TLS. +- **WCF** — `ServiceThrottlingBehavior`, "Large Data and Streaming", "Channel Factory and Caching". +- **WinForms / WPF** — "Optimizing WPF Application Performance" series; WinForms `DataGridView` performance & virtual mode. +- **Caching / DI / interop** — "Caching in .NET"; StackExchange.Redis (Basics, Pipelines & Multiplexers); ".NET dependency injection guidelines"; COM interop / Runtime Callable Wrapper / P/Invoke type-marshalling; "Considerations for unattended/server-side Automation of Office". diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/aspnet-core.md b/.claude/skills/performance-audit/profile-packs/dotnet/aspnet-core.md new file mode 100644 index 00000000..18224f9c --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/aspnet-core.md @@ -0,0 +1,39 @@ +# .NET performance module: ASP.NET Core (hosting & pipeline) +> Load when `Microsoft.AspNetCore.*` (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the ASP.NET Core (hosting & pipeline) lens only. + +## ASP.NET Core (hosting & pipeline) +- **Hosting model for IIS**: prefer in-process hosting (the default since ASP.NET Core 3.0) over + out-of-process (ANCM reverse-proxy); out-of-process forwards each request over a localhost + loopback adapter, adding a measurable round-trip per request — check `<AspNetCoreHostingModel>` + in the project file or `web.config` `hostingModel` attribute. +- **Middleware ordering**: order cheap, short-circuiting middleware (static files, authentication + short-circuits, health checks) before expensive middleware; placing heavy middleware (logging, + response buffering, authorisation) before short-circuit middleware means they run even on + requests that will be rejected or served from cache — review `app.Use*` ordering in `Program.cs`. +- **Per-request allocations in custom middleware**: middleware that allocates objects (DTOs, + buffers, service resolution via `GetService`) on every request contributes to GC pressure; + use constructor-injected singletons, `ArrayPool<T>`, or `ObjectPool<T>` for reusable state + and avoid per-invocation `new` in the `InvokeAsync` hot path. +- **Missing response compression + output caching**: cacheable endpoints returning JSON, HTML, + or plain text without `AddResponseCompression`/`UseResponseCompression` or `AddOutputCache`/ + `UseOutputCache` miss significant payload savings; output caching also prevents redundant + re-execution of expensive handlers — both should be intentional defaults on public-facing + APIs (verify against the currency brief for your version). +- **Buffering large collections instead of streaming**: returning `IEnumerable<T>` from a + controller/minimal-API handler causes the serialiser to enumerate the full set before + flushing; prefer `IAsyncEnumerable<T>` to stream JSON rows as they arrive from the database, + reducing peak memory and time-to-first-byte for large result sets (verify against the + currency brief for your version). +- **Synchronous I/O in the pipeline**: reading `HttpRequest.Body` or writing `HttpResponse.Body` + synchronously blocks a Kestrel I/O thread; Kestrel does not support synchronous reads by + default (`AllowSynchronousIO` defaults to `false`); synchronous action filters and result + filters similarly stall the pipeline — verify all pipeline code uses async overloads. +- **Static files via the app instead of CDN / with missing cache headers**: `UseStaticFiles` + serves files without an upstream CDN layer and without aggressive `Cache-Control` headers + by default; long-lived assets (versioned JS/CSS) should carry `Cache-Control: max-age` and + ideally be offloaded to a CDN to reduce origin load and round-trip latency. +- **Minimal APIs vs MVC controllers on hot endpoints**: minimal APIs have lower per-request + overhead (no model-binding pipeline, no action-filter chain, no view-engine plumbing) for + simple request/response patterns; consider minimal APIs for throughput-sensitive endpoints + and reserve MVC for endpoints that genuinely use filters, model validation, or view + rendering (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/blazor.md b/.claude/skills/performance-audit/profile-packs/dotnet/blazor.md new file mode 100644 index 00000000..932326d6 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/blazor.md @@ -0,0 +1,47 @@ +# .NET performance module: Blazor +> Load when `*.razor` (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the Blazor lens only. + +## Blazor +- **Render-model choice is the dominant performance decision**: Blazor Server adds a + network round-trip and server-side memory (one SignalR circuit per active user) for every + interaction; Blazor WebAssembly shifts CPU to the client and incurs a large initial payload; + the unified `.NET 8+` Auto render mode uses Server for first load and migrates to WASM once + downloaded — pick the model intentionally based on latency, payload, and scale requirements + (verify against the currency brief for your version). +- **Unnecessary component re-renders**: every parent re-render recursively re-renders children + unless suppressed; override `ShouldRender()` to return `false` when parameters are unchanged + complex types; use primitive or immutable parameters where possible so Blazor's built-in + change-detection skips re-rendering automatically; set `@key` on list items so the differ + matches components to data by identity rather than position (verify against the currency + brief for your version). +- **Large lists without `<Virtualize>`**: rendering thousands of items in a `foreach` loop + materialises every row into the DOM; wrap large lists in `<Virtualize Items="…">` to render + only the visible viewport rows, reducing both render time and DOM node count (verify against + the currency brief for your version). +- **Heavy work in lifecycle methods**: `OnInitialized`/`OnParametersSet` run synchronously + before the first render; expensive synchronous work here blocks the render thread on Server + or the WASM main thread; use `OnInitializedAsync`/`OnParametersSetAsync` with `await` and + cache results that are stable across re-renders to avoid re-executing on every parameter + update. +- **Chatty JS interop**: calling `IJSRuntime.InvokeAsync` inside a render loop, from + `OnAfterRenderAsync`, or once per component instance in a large list adds latency (especially + on Blazor Server, where each call crosses the SignalR wire); batch JS calls where possible, + avoid per-render invocations, and prefer `IJSInProcessRuntime` on Blazor WebAssembly for + synchronous, zero-round-trip JS calls (verify against the currency brief for your version). +- **`StateHasChanged` called too broadly**: calling `StateHasChanged()` unconditionally or + from high-frequency events (scroll, mouse-move, timer) re-renders the entire component + subtree; call it only when state has actually changed, throttle high-frequency sources, and + use `IHandleEvent` or `EventUtil.AsNonRenderingEventHandler` to suppress automatic + re-renders for event handlers that do not change visible state. +- **WebAssembly payload and startup**: large WASM initial download (runtime + assemblies) + directly affects Time-to-Interactive; enable AOT compilation for CPU-intensive apps + (improves runtime speed at the cost of larger download), enable IL trimming, and use + lazy-loaded assemblies (`@attribute [DynamicDependency]` + lazy routing) to defer loading + feature assemblies until their routes are first visited (verify against the currency brief + for your version). +- **Missing prerendering / streaming rendering**: Blazor Server and Blazor Web App (`.NET 8+`) + can prerender components to static HTML for fast first-paint before the circuit connects; + streaming rendering (`[StreamRendering]`) allows long async operations to return a + placeholder immediately and push the final content when ready — omitting both leaves users + watching a blank screen during circuit negotiation or slow data fetches (verify against the + currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/caching.md b/.claude/skills/performance-audit/profile-packs/dotnet/caching.md new file mode 100644 index 00000000..3c061793 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/caching.md @@ -0,0 +1,52 @@ +# .NET performance module: Caching +> Load when `IMemoryCache`/`MemoryCache`/`HttpRuntime.Cache` (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the Caching lens only. + +## Caching + +> Cross-cutting on **both** runtimes. In-process caching APIs differ by runtime: `IMemoryCache` +> (`Microsoft.Extensions.Caching.Memory`) on modern .NET (and available on Framework via the +> NuGet package), `System.Runtime.Caching.MemoryCache` as the portable Framework option, and +> classic ASP.NET `HttpRuntime.Cache` / `System.Web.Caching.Cache` on Framework web apps. Bullets +> are *conditions to look for*. Note up front: **cache invalidation correctness** (stale/wrong +> values served, missed evictions on writes) is a **bug-hunt concern, not a perf finding** — flag +> the boundary, don't score it as a perf win. + +- **No cache on an expensive idempotent read repeated under load**: an expensive, infrequently- + changing computation or remote/DB fetch recomputed on every request is the canonical caching + opportunity — wrap it in an in-process cache (`IMemoryCache.GetOrCreate`/`GetOrCreateAsync`, + `System.Runtime.Caching.MemoryCache`, or `HttpRuntime.Cache` on Framework) keyed by its inputs. +- **Cache stampede / thundering herd**: on a cold or just-evicted key, many concurrent requests + all miss and recompute the same expensive value simultaneously, amplifying load at the worst + moment. `IMemoryCache.GetOrCreate` does **not** coordinate concurrent factory calls by default — + a per-key lock/`SemaphoreSlim` (single-flight) or `HybridCache` (built-in stampede protection, + **.NET 9+**) is needed so only one caller computes while the rest await the result (verify + against the currency brief for your version). +- **Eviction & expiration not configured**: distinguish **absolute** (`AbsoluteExpiration` / + `AbsoluteExpirationRelativeToNow` — entry dies at a fixed time) from **sliding** + (`SlidingExpiration` — resets on each access, so a hot key can live forever); a sliding-only + policy on a popular key never refreshes and can serve stale data indefinitely. +- **`IMemoryCache` with no size limit grows unbounded**: by default `IMemoryCache` has **no size + limit** and only evicts on expiration or memory pressure — set `SizeLimit` on + `MemoryCacheOptions` and a per-entry `Size` (`SetSize`) so it bounds itself, or a cache of + large/variable entries can drive the process toward OOM. Watch for entries cached with no + expiration *and* no size accounting. +- **Distributed cache connection opened per call**: with `IDistributedCache` over + **StackExchange.Redis**, the `ConnectionMultiplexer` is **expensive to create and fully + thread-safe** — create **one** shared/long-lived instance (singleton) and reuse it; opening a + multiplexer per operation (or per request) is a classic throughput killer. The multiplexer + already pipelines and multiplexes concurrent callers over a single connection, so connection + *pools* are unnecessary (verify against the currency brief for your version). +- **Serialization cost & large/hot keys on distributed entries**: `IDistributedCache` stores + `byte[]`, so every read/write pays serialize/deserialize plus network I/O — large payloads, + chatty per-field caching, and a single hot key funneling all traffic to one Redis node are the + cost centers. Cache coarse, right-sized values; mind the serializer choice (`System.Text.Json` + source-gen vs reflection — cross-reference the idiom-currency lane). +- **N sequential Redis round-trips instead of a batch/pipeline**: a loop issuing one + `StringGet`/`StringSet` per key pays a network round-trip each time. Fire the calls concurrently + (`StringGetAsync` × N then await), use `CreateBatch`, or `MGET`/`MSET`-style multi-key commands + so the multiplexer pipelines them into far fewer round-trips; reserve `CommandFlags.FireAndForget` + for non-critical writes (verify against the currency brief for your version). +- **Missing output/response caching on cacheable endpoints**: re-executing an expensive handler for + identical requests that could be served from a cached response — see the **ASP.NET Core (hosting + & pipeline)** subsection (`AddOutputCache`/`UseOutputCache`, response caching) and the **Classic + ASP.NET** `OutputCache` bullet. diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/dependency-injection.md b/.claude/skills/performance-audit/profile-packs/dotnet/dependency-injection.md new file mode 100644 index 00000000..9da39e65 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/dependency-injection.md @@ -0,0 +1,41 @@ +# .NET performance module: Dependency injection (containers) +> Load when `Microsoft.Extensions.DependencyInjection` (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the Dependency injection (containers) lens only. + +## Dependency injection (containers) + +> Cross-cutting on **both** runtimes (MS `Microsoft.Extensions.DependencyInjection` on modern .NET; +> Autofac / Unity / Ninject / StructureMap / SimpleInjector / Castle Windsor on Framework). Bullets +> are *conditions to look for*. Lifetime terms below use MS DI names (Singleton / Scoped / +> Transient); other containers have equivalents. + +- **Slow container on deep graphs resolved per request**: resolving a deep object graph on every + request has real cost, and **container choice matters** — reflection/expression-heavy containers + (Ninject, older Unity) are markedly slower than fast ones (MS DI, SimpleInjector, DryIoc, Lamar). + Flag a hot path resolving a large graph through a known-slow container (verify against the + currency brief / benchmark for the specific container and version). +- **Lifetime misconfiguration — Transient/Scoped where Singleton fits**: registering an expensive- + to-build, stateless, thread-safe object (mapper/`MapperConfiguration`, serializer settings, + compiled regex, a configured `HttpClient`/typed client) as Transient or Scoped rebuilds it on + **every resolve** instead of once. Promote genuinely shareable, expensive objects to Singleton. +- **Captive dependency**: a longer-lived service capturing a shorter-lived one — e.g. a **Singleton + injecting a Scoped/Transient** — pins the short-lived instance for the captor's whole lifetime + (a leak *and* a correctness bug: per-request state shared across requests, e.g. a captured + `DbContext`). Enable scope validation (`validateScopes: true` / dev default) to catch "Cannot + consume scoped service from singleton". +- **Transient `IDisposable` tracked by the container**: MS DI **tracks** transient and scoped + services that implement `IDisposable` and only disposes them when their scope (or the root + container) is disposed. Transient disposables resolved from the **root/long-lived container** are + never released until shutdown — an accumulating leak. Don't register `IDisposable` as transient + resolved at root; use a factory / explicit scope (`IServiceScopeFactory.CreateScope`) instead. +- **Service-locator / resolving inside loops on the hot path**: calling `GetService`/ + `GetRequiredService` (or injecting `IServiceProvider`/a factory and resolving at runtime) inside + a request loop pays repeated lookup/allocation cost and hides the dependency — prefer + constructor injection so the graph is built once per scope. +- **Container build/warm-up not amortized at startup**: first-resolve compilation (expression-tree + /reflection registration) adds cold-start latency — build the provider once at startup and warm + expensive singletons during initialization rather than on the first user request (see the + payload-startup lane). +- **Property / reflection-based activation vs constructor injection**: property injection and + convention/reflection-based registration are slower to resolve and harder to validate than + constructor injection; the built-in MS DI container doesn't support property injection (a reason + to reach for a third-party container) — prefer constructor injection where the container allows. diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/interop.md b/.claude/skills/performance-audit/profile-packs/dotnet/interop.md new file mode 100644 index 00000000..d512bb92 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/interop.md @@ -0,0 +1,49 @@ +# .NET performance module: Native / COM interop (incl. Office automation) +> Load when `[DllImport]`/`[LibraryImport]` (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the Native / COM interop (incl. Office automation) lens only. + +## Native / COM interop (incl. Office automation) + +> Generalizes to **any** native/COM interop — Office automation is the most common offender but the +> same costs apply to any P/Invoke or COM library. Windows-centric; COM interop is .NET Framework +> plus .NET Core 3.0+/.NET 5+ on Windows (the `ComWrappers` API arrived in .NET 6, a COM source +> generator in .NET 8). Bullets are *conditions to look for*. + +- **Chatty cross-boundary calls in a loop**: every managed↔native or managed↔COM transition has + fixed overhead (marshaling, RCW dispatch, security checks). A loop making one P/Invoke or COM + call per item multiplies that overhead — **batch** into a single coarse call that moves all the + data at once (verify against the currency brief for your version). +- **COM apartment marshaling (STA/MTA)**: calls that cross apartment boundaries are **proxied and + serialized** through the COM marshaler rather than direct vtable calls — a hidden per-call cost. + An STA object touched from MTA/thread-pool threads (or vice versa) pays this on every call; keep + COM objects on a compatible apartment and avoid cross-apartment chatter. +- **COM RCWs not released deterministically**: the runtime holds one RCW per COM object and only + releases the underlying COM reference when the RCW is garbage-collected — relying on the GC + orphans server processes (the classic leftover `EXCEL.EXE` / `WINWORD.EXE`). Release RCWs + deterministically with `Marshal.ReleaseComObject` (decrements the ref count) or + `Marshal.FinalReleaseComObject` (zeros it), releasing **every** intermediate object you touch + (no two-dot expressions like `book.Worksheets[1]` that create an unreleased RCW). +- **Office automation — per-cell access**: reading/writing an Excel `Range` cell-by-cell makes one + cross-process COM call per cell. Read/write the **whole `Range` in one call via an `object[,]` + array** (`Range.Value` / `Range.Value2`) — orders of magnitude fewer round-trips. +- **Server-side Office automation is unsupported by Microsoft**: automating Office apps (Excel, + Word, Outlook) from a service/ASP.NET/unattended process is explicitly **unsupported** — Office + assumes an interactive desktop (modal dialogs hang the process), is not reentrant or scalable for + concurrent server use, has session/identity and stability issues, and can run untrusted macros. + Use a document library instead: the **Open XML SDK** (`DocumentFormat.OpenXml`) for `.xlsx`/ + `.docx`/`.pptx`, or a third-party reporting/spreadsheet library — no Office install, faster, and + supported. +- **Late-bound `dynamic`/IDispatch COM vs early-bound interop**: late binding through `IDispatch` + (C# `dynamic` over COM, or `Type.InvokeMember`) resolves members by name at runtime and is much + slower than early-bound calls through a generated interop assembly / typed interface — prefer + early-bound interop (a referenced Primary Interop Assembly or typed wrapper) on hot paths. +- **P/Invoke marshaling cost**: prefer **blittable** types (integers, pointers, blittable structs + with `LayoutKind.Sequential`) which need no conversion; **non-blittable** parameters (`string`, + `bool`, non-blittable structs, arrays) allocate and **copy** on every call. Avoid tiny P/Invoke + calls in tight loops; note `SetLastError = true` adds per-call overhead (capturing the OS error); + on modern .NET prefer the `[LibraryImport]` source generator over `[DllImport]` for AOT-friendly, + lower-overhead marshaling (verify against the currency brief for your version). +- **Native handles not wrapped in `SafeHandle`**: raw `IntPtr` handles from native APIs leak on + exceptions and race with finalization/`P/Invoke`; wrap them in a `SafeHandle`-derived type (or + `CriticalHandle`) for reliable, deterministic release of OS resources. + +--- diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/messaging-realtime.md b/.claude/skills/performance-audit/profile-packs/dotnet/messaging-realtime.md new file mode 100644 index 00000000..033292d4 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/messaging-realtime.md @@ -0,0 +1,53 @@ +# .NET performance module: Messaging & realtime (SignalR / MSMQ / queues) +> Load when `Microsoft.AspNetCore.SignalR`/`Microsoft.AspNet.SignalR`, `System.Messaging` (MSMQ), `Azure.Messaging.ServiceBus`, `RabbitMQ.Client` is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the Messaging & realtime (SignalR / MSMQ / queues) lens only. + +## Messaging & realtime (SignalR / MSMQ / queues) + +> Spans **both** runtimes: ASP.NET Core SignalR (`Microsoft.AspNetCore.SignalR`) and legacy ASP.NET +> SignalR (`Microsoft.AspNet.SignalR`) for realtime hubs; **MSMQ** (`System.Messaging`) on Framework; +> and message brokers — **Azure Service Bus** (`Azure.Messaging.ServiceBus`) and **RabbitMQ** +> (`RabbitMQ.Client`). Bullets are *conditions to look for*. The recurring themes are connection +> reuse, batching to cut round-trips, payload sizing, and async over blocking I/O. + +- **SignalR scaleout needs a backplane**: SignalR tracks connection state **per server process**, so + in a server farm a hub on one node is unaware of connections on the others — `Clients.All` / + group broadcasts from one node never reach clients on the others. This is a **correctness** problem + first (messages silently lost) and a single-node bottleneck second. A multi-server deployment needs + a backplane — the **Redis backplane** or the **Azure SignalR Service** (which also offloads the + persistent connections off your servers); sticky sessions / session affinity are still required + except with Azure SignalR Service (verify against the currency brief for your version). +- **Chatty hub calls / many small frequent messages**: each invoke is framed and dispatched; very + frequent tiny messages waste framing and dispatch overhead. Batch updates where the UX allows, and + prefer the **MessagePack hub protocol** (`Microsoft.AspNetCore.SignalR.Protocols.MessagePack`, + added via `AddMessagePackProtocol`) over the default JSON protocol — it is a compact binary format + producing smaller, faster-to-(de)serialize payloads (verify against the currency brief for your + version). +- **SignalR fan-out cost**: broadcasting to very large groups or `Clients.All` multiplies one logical + send into N transmissions; large per-connection state multiplies memory across every persistent + connection. Scope broadcasts to the smallest necessary group, and keep per-connection state lean. +- **SignalR streaming vs buffering large results**: returning one big buffered payload blocks and + spikes memory; prefer hub streaming (`IAsyncEnumerable<T>` / `ChannelReader<T>`) to push results + incrementally and bound memory on big result sets (verify against the currency brief for your + version). +- **MSMQ per-message transactions**: wrapping every `Send`/`Receive` in its own + `MessageQueueTransaction` is expensive — batch many messages into **one** transaction to amortize + the commit cost. Also weigh **recoverable** (disk-persisted, durable) vs **express** (in-memory) + delivery — express trades durability for throughput — and note that large message bodies serialize + slowly (the default `XmlMessageFormatter` is reflection-heavy; a leaner formatter or pre-serialized + `byte[]` body is faster). +- **Broker connection / client reused, not opened per message**: for Azure Service Bus, a + `ServiceBusClient` (and its `ServiceBusSender`/`ServiceBusReceiver`/`ServiceBusProcessor`) is + **expensive to establish** and fully thread-safe — register it as a **singleton** / long-lived and + reuse it; do **not** create or dispose one per message. The same holds for RabbitMQ's `IConnection` + (share one long-lived connection, use per-thread `IModel`/channels). Opening a connection per + message is a classic throughput killer (verify against the currency brief for your version). +- **Round-trips not cut with prefetch / batching**: receiving one message per round-trip leaves + throughput on the table — set a sensible **prefetch** (`ServiceBusReceiver.PrefetchCount`, or + RabbitMQ `BasicQos`) so the client pulls a batch into a local cache, and use **batch send/receive** + (`SendMessagesAsync` with a batch, `ReceiveMessagesAsync`) to amortize network cost. Right-size + message bodies; **sessions / ordering guarantees add per-message overhead** — only enable them when + ordering is actually required (verify against the currency brief for your version). +- **Blocking synchronous send/receive on request paths**: synchronous broker/queue calls on a + request thread block a thread-pool thread and invite starvation — use the async APIs + (`SendMessageAsync`/`ReceiveMessageAsync`, processor callbacks) and don't sync-over-async with + `.Result`/`.Wait()` (cross-reference the core **Concurrency & parallelization** lane). diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/object-mapping.md b/.claude/skills/performance-audit/profile-packs/dotnet/object-mapping.md new file mode 100644 index 00000000..02afa402 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/object-mapping.md @@ -0,0 +1,44 @@ +# .NET performance module: Object mapping (AutoMapper / Mapperly) +> Load when `AutoMapper`, `Riok.Mapperly`, `IMapper`, `.Map<`, `.ProjectTo<` is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the Object mapping (AutoMapper / Mapperly) lens only. + +## Object mapping (AutoMapper / Mapperly) + +> Cross-cutting on **both** runtimes. Two distinct models: reflection-based **AutoMapper** +> (`IMapper.Map<>` / `ProjectTo<>`, configured via `MapperConfiguration` / `Profile`s) and +> source-generated **Mapperly** (`[Mapper]` partial classes, compile-time, zero reflection). +> Bullets are *conditions to look for* — the recurring theme is reflection/config cost paid in hot +> loops or over large collections, and missed query-side projection. + +- **Configure once, reuse the mapper**: building a `MapperConfiguration` or `new Mapper(...)` per + call re-scans `Profile`s by reflection and rebuilds the type maps — a real per-call cost. Build + the config **once** and register `IMapper` as a **singleton**, then reuse it (it is thread-safe); + resolving or constructing it per request defeats AutoMapper's internal plan caching + (cross-reference the **Dependency injection (containers)** module — this is the canonical + "expensive, stateless, thread-safe object registered as Transient/Scoped" case). +- **`ProjectTo<TDto>()` over `IQueryable` instead of `Map<>` after materializing**: `ProjectTo` + emits the projection into the SQL `SELECT` so the database returns **only the mapped columns** and + EF never materializes or change-tracks full entities. The anti-pattern is `.ToList()` (or + `.ToListAsync()`) **then** `.Map<List<TDto>>(...)`, which pulls whole entities into memory first + and maps in-process — far more I/O, allocation, and tracking overhead (cross-reference the + **Data access — SQL Server** module: over-fetching and missing `AsNoTracking()`). +- **Reflection cost of complex / nested / conditional maps**: custom `ITypeConverter`, + `MapFrom`/`ConvertUsing` resolvers, `AfterMap`/`BeforeMap` hooks, and deep member-by-member + mapping run per element — in a hot loop or over a large collection this dominates. Measure before + assuming the map is cheap; the cost scales with map complexity × element count. +- **Mapping very large collections element-by-element**: even a well-configured map allocates and + invokes per item. On the hottest paths a hand-written projection (`Select(x => new TDto { ... })`) + — pushed into the query via `ProjectTo`/`Select` where the source is `IQueryable` — is often + measurably faster; reserve the generic mapper for cooler paths where developer ergonomics win. +- **Source-generated mapping (Mapperly) for hot paths / AOT**: `Riok.Mapperly` generates the + mapping code at **compile time** with **zero runtime reflection** and **no runtime configuration**, + making it trimming-safe and Native-AOT-friendly and typically several times faster (and + lower-allocation) than reflection-based AutoMapper on the same map. Prefer it for hot paths and + AOT/trimmed apps; the generated code is plain readable C# and accepts hand-written partial methods + for custom cases (verify against the currency brief for your version). +- **Over-mapping**: mapping fields the consumer never reads wastes work on every call — map only the + members the DTO actually exposes, and right-size the DTO to the screen/endpoint that consumes it. +- **Deep graph mapping triggering lazy loads**: mapping a navigation property that isn't eagerly + loaded fires a lazy-load query per access during the map — a classic accidental N+1 hidden inside + the mapper (cross-reference the **Data access — SQL Server** module N+1 bullet). `ProjectTo` + sidesteps this by projecting the whole graph in one query; in-memory `Map<>` over partially-loaded + entities does not. diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/sql-server-data.md b/.claude/skills/performance-audit/profile-packs/dotnet/sql-server-data.md new file mode 100644 index 00000000..1f2d82e3 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/sql-server-data.md @@ -0,0 +1,215 @@ +# .NET performance module: Data access — SQL Server (EF6 / EF Core / ADO.NET / Dapper) +> Load when EF6/EF Core (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the Data access — SQL Server (EF6 / EF Core / ADO.NET / Dapper) lens only. + +## Data access — SQL Server (EF6 / EF Core / ADO.NET / Dapper) + +> High-value focus for database-driven enterprise apps. Bullets are *conditions to look +> for* in application/query code (not DBA tasks). Be precise about API attribution: +> `AsNoTracking` exists in **both** EF6 and EF Core, but `AsSplitQuery` / +> `AsNoTrackingWithIdentityResolution` / `ExecuteUpdate` / `ExecuteDelete` / +> `AddDbContextPool` / compiled models / automatic `SaveChanges` batching are **EF Core +> only**. EF6 has none of those — its bulk story is third-party (EFCore.BulkExtensions is +> EF Core; for EF6 use **EFUtilities**, **EntityFramework.Extended**, or drop to +> `SqlBulkCopy` / TVPs / stored procs). Cross-reference the ORM and data-access entries in +> the version index. + +### N+1 & loading strategy +- **Lazy-loading a navigation property inside a loop** fires one SQL query per iteration + (the classic N+1). Both EF6 and EF Core: replace with eager `.Include()` or a projection. + EF6 lazy loading is on by default for `virtual` navigations + a proxy-enabled context; + EF Core requires the `Microsoft.EntityFrameworkCore.Proxies` package + `UseLazyLoading` + (or a lazy-loading service injection) — but explicit `.Load()` in a loop reproduces N+1 + in either (verify against the currency brief for your version). +- **Explicit loading (`.Entry(e).Collection/Reference(...).Load()`) inside a loop** is N+1 + by another name; batch the parent keys and load related data with a single query or + projection instead. +- **Cartesian explosion from multiple collection `.Include()`s**: each one-to-many `Include` + multiplies rows (each blog row duplicated per post, etc.), inflating the result set and + network/materialisation cost. EF Core: `AsSplitQuery()` issues one SQL statement per + collection instead of a join (**EF Core 5.0+**; note it round-trips per query and + buffers all-but-last result set unless MARS is on). EF6 has **no** split-query API — + break the load into multiple explicit queries (verify against the currency brief for + your version). +- **Materialising full entities when only a few columns are used**: project to a DTO with + `.Select(...)` so EF emits a narrow `SELECT` and skips entity tracking/fixup. A projection + that pulls only the needed columns also lets a covering index satisfy the query. +- **Loading a whole graph to read one related value**: prefer projecting the single value + (`.Select(b => b.Posts.Count)` etc.) over `Include`-ing the whole collection. + +### Change tracking & SaveChanges +- **Tracking on read-only queries**: the change tracker snapshots every materialised entity + (memory + CPU). Add `AsNoTracking()` (both EF6 and EF Core) for queries whose results are + never modified+saved. EF Core only: `AsNoTrackingWithIdentityResolution()` (**EF Core + 5.0+**) when you need no-tracking speed but still want related entities de-duplicated in + the result graph (verify against the currency brief for your version). +- **`DetectChanges` is O(n) over all tracked entities** and is triggered implicitly by + `Add`/`Remove`/`Find`/`Entry`/`SaveChanges`. In a large insert/update loop this becomes + O(n²). Set `Configuration.AutoDetectChangesEnabled = false` (EF6) / + `ChangeTracker.AutoDetectChangesEnabled = false` (EF Core) around the loop and re-enable + after — or use `AddRange`/`RemoveRange`, which pay the `DetectChanges` cost once for the + whole set instead of per entity. +- **A long-lived / accumulating `DbContext`**: the more entities tracked, the slower every + `DetectChanges` and the larger the memory footprint. Use a short, per-unit-of-work / + per-request `DbContext` lifetime; do not cache a context across requests. (EF Core: + `AddDbContextPool` reuses *cleared* instances to skip per-request model init — **EF Core + 2.0+** — but does not change the per-context tracking-accumulation rule; ensure no + request-scoped state leaks between pooled instances.) +- **EF6 `SaveChanges` issues one server round-trip per affected row** — no statement + batching. For large writes this is a major latency sink; use `SqlBulkCopy`, table-valued + parameters, a stored proc, or a third-party EF6 bulk library (EFUtilities / + EntityFramework.Extended) instead of a per-row `Add` + `SaveChanges` loop. +- **EF Core batches `SaveChanges` automatically** into multi-statement round-trips (default + cap ~42 statements/batch for SQL Server; batching is skipped when <4 statements as it + isn't a win there). Tune with `MinBatchSize`/`MaxBatchSize` on the SQL Server options + only with measurement. Still, even batched, EF Core sends one `UPDATE`/`DELETE` per + entity — see the bulk-mutation bullet below (verify against the currency brief for your + version). +- **Load-mutate-`SaveChanges` for bulk mutations (EF Core)**: replace with `ExecuteUpdate` / + `ExecuteDelete` / `ExecuteUpdateAsync` / `ExecuteDeleteAsync` (**EF Core 7.0+**) — a + single server-side `UPDATE`/`DELETE` over a predicate, no entity loading, no tracking. + **EF6 has no equivalent** — use `Database.ExecuteSqlCommand` with parameterised raw + SQL/stored proc (verify against the currency brief for your version). + +### Query translation & plan reuse +- **Client-side evaluation of a predicate EF can't translate**: in **EF Core** an + untranslatable `Where`/`OrderBy` in the server-evaluable part of a query **throws by + default** (since EF Core 3.0) — but a predicate moved after `AsEnumerable()`/`ToList()` + silently filters in memory after pulling all rows. **EF6 silently degrades**: it pulls + rows and filters client-side without warning. Flag any LINQ predicate using a method EF + can't translate (custom C# methods, non-mapped properties) feeding a large table. +- **Ad-hoc / string-concatenated SQL pollutes the SQL Server plan cache**: SQL Server + matches cached plans **character-for-character**, so each distinct literal string forces + a fresh compile and a new (low-value, evictable) ad-hoc plan entry, bloating the cache + and starving reusable plans. EF, Dapper, and `sp_executesql` parameterise automatically; + **raw `SqlCommand` built by string concatenation must use `SqlParameter`s** (also closes + SQL injection). Flag `"... WHERE x = '" + value + "'"`-style command text. +- **Varying IN-clause / parameter-list length generates distinct cached plans**: + `.Where(x => ids.Contains(x.Id))` produces a different parameter count per call, so each + list size is a separately-compiled plan (cache churn). EF6 is especially affected (it + also can't cache `Contains` over an in-memory collection at all — the values are treated + as volatile and the query recompiles every call, slower with larger lists). EF Core 8/9 + used `OPENJSON`; **EF Core 10** parameterises the IN-list with EF-side padding to bound + plan proliferation. Prefer a **TVP** or a temp-table join for large/variable sets (verify + against the currency brief for your version). +- **`Skip`/`Take`/`Contains`/`DefaultIfEmpty` inline their arguments as constants (EF6)** — + not parameters — so otherwise-identical paged queries pollute both the EF and SQL Server + plan caches per distinct value. A known EF6 plan-cache pitfall; prefer parameterised + shapes where possible (verify against the currency brief for your version). +- **Dynamically-built LINQ with a constant Expression node** recompiles every call and + pollutes the DB plan cache; build the dynamic expression with a **parameter** node so the + tree shape (and SQL) is stable. (EF Core query-cache hit rate staying below ~100% after + warm-up is the diagnostic signal.) +- **Hot, identically-shaped queries**: pre-compile to skip the cache lookup. EF Core: + `EF.CompileQuery` / `EF.CompileAsyncQuery` (**EF Core 2.0+**, scalar params only, single + model). LINQ-to-SQL: `CompiledQuery.Compile`. **EF6 auto-caches** LINQ-to-Entities plans + ("autocompiled queries", since EF5) so explicit `CompiledQuery` gives little extra and is + **ObjectContext-only** (not `DbContext`) — rarely worth it on EF6 (verify against the + currency brief for your version). + +### SQL Server sargability & implicit conversions (app-side, high-ROI) +- **The classic EF6 `nvarchar`-vs-`varchar` implicit conversion**: EF6 maps `string` to + **`nvarchar`** by default, so a `Where(x => x.Code == s)` against a `varchar`-typed, + indexed column sends an `nvarchar` parameter → SQL Server applies an **implicit + conversion that defeats the index seek and forces a scan**. Fix by mapping the property + non-Unicode: `[Column(TypeName = "varchar")]` / Fluent `.IsUnicode(false)` (EF6 and EF + Core both honour this). One of the highest-ROI, easily-missed findings on legacy EF6 + schemas with `varchar` keys (verify against the currency brief for your version). +- **Non-sargable predicates built in LINQ that wrap the column in a function**: e.g. + `Where(x => x.Date.Year == 2025)` → `WHERE YEAR(col) = …`, `Where(x => x.Name.ToUpper() + == v)` → `WHERE UPPER(col) = …`, or any computed expression on the column. The function + on the column side prevents an index seek (full scan instead). Rewrite as a range + (`x.Date >= start && x.Date < end`) or rely on a case-insensitive collation rather than + `ToUpper`/`ToLower`. +- **Leading-wildcard `LIKE '%term'`** (from `Contains`/`EndsWith`) cannot use a B-tree + index seek — full scan. Flag on large tables; consider full-text search or a redesigned + predicate. (`StartsWith` → `LIKE 'term%'` *is* sargable.) +- **Parameter type/length mismatch generally**: a parameter whose CLR/SQL type or length + differs from the column (e.g. wider `nvarchar(4000)` parameter vs `varchar(50)` column, + `int` vs `bigint`) can trigger an implicit conversion and a scan. Verify EF mappings and + hand-written `SqlParameter` types/sizes match the column definition. + +### Round-trips, sets & paging +- **Row-by-row (RBAR) operations** — a loop issuing one `INSERT`/`UPDATE`/`DELETE` per row — + vs a single set-based statement. Flag per-row DML loops; prefer set-based SQL, + `ExecuteUpdate`/`ExecuteDelete` (EF Core 7+), or `SqlBulkCopy`/TVP for writes. +- **Table-Valued Parameters (TVPs)** pass an entire set to the server in **one round-trip** + (as a `SqlDbType.Structured` parameter / EF Core raw SQL) — prefer over many individual + calls or huge/variable IN-lists. TVPs also give the optimiser real cardinality and a + stable plan shape. +- **Missing pagination pulling whole tables**: any unbounded query that could grow + unboundedly should page. Offset paging (`Skip(n).Take(m)` → `OFFSET … FETCH`) re-scans + `n` rows per page and degrades deep into the set; prefer **keyset/cursor pagination** + (`WHERE key > @last ORDER BY key`) for production volumes. +- **`SELECT *` / over-fetching** materialises columns you don't use and **defeats covering + indexes** (the engine can't satisfy the query from a narrow index and must look up the + base rows). Project only needed columns. +- **`MultipleActiveResultSets=True` (MARS)** lets multiple readers share one connection + (and EF Core relies on it to avoid buffering all-but-last result set in split queries), + but it adds overhead and has interleaving/transaction gotchas — enable intentionally, not + reflexively. +- **Multiple separate round-trips that could be one batch**: Dapper `QueryMultiple` (and + raw `SqlDataReader.NextResult()`) return several result sets from a single command — + batch related reads instead of N separate `Query` calls. + +### ADO.NET & connections +- **Buffering a whole `DataSet`/`DataTable` for a large read** vs streaming a forward-only + `SqlDataReader` (the reader is unbuffered — data isn't cached in memory). For large + BLOB/CLOB columns add `CommandBehavior.SequentialAccess` so wide columns stream via + `GetBytes`/`GetChars` rather than buffering the whole row. +- **Row-by-row inserts** → use **`SqlBulkCopy`** for bulk load (orders of magnitude faster + for large batches; works on Framework and modern .NET). +- **Connection-pool fragmentation / defeat**: a pool is keyed by the **exact connection + string** — strings that differ even slightly (different `Application Name`, integrated- + security identity, or per-database `master`-then-`USE` patterns) spawn **separate pools** + and waste connections. Keep connection strings byte-identical. Default `Max Pool Size` is + 100 and a connection request blocks up to ~15 s when the pool is exhausted, then throws — + a leaked (un-disposed) connection silently shrinks usable pool capacity. +- **Not disposing connections/commands/readers**: a `SqlConnection` not closed via + `using`/`Dispose` is not returned to the pool; under load this exhausts the pool and + causes timeout exceptions. Always `using` connections, commands, and readers. +- **Holding connections open longer than needed / opening early**: open the connection as + late as possible and close (return to pool) as early as possible; don't open a connection + then do CPU work or call other services while holding it. +- **Synchronous DB calls on async request paths**: use `OpenAsync`/`ExecuteReaderAsync`/ + `ExecuteNonQueryAsync` to free the thread during I/O. EF6 async exists **since EF6.0** (on + .NET 4.5+); flag sync EF6 calls on async paths. +- **Missing `CommandTimeout`**: relying on the default (30 s) for a heavy report query + causes spurious failures; for a query that should be fast, a too-long timeout masks a + runaway plan — set intentionally. + +### Transactions & isolation +- **Long-running transactions hold locks and block other sessions**: keep transactions + short; never wrap user think-time, external HTTP calls, or large client-side processing + inside an open transaction. Flag a `TransactionScope`/`BeginTransaction` that spans + network I/O or a long loop. +- **Default `READ COMMITTED` lock-based blocking** under write contention: read queries + block behind writers' locks. **Read Committed Snapshot Isolation (RCSI)** serves readers + from row-versions (no shared-lock blocking) — a database-level setting, but worth + flagging from app code that shows reader/writer blocking; do not silently rely on it being + on. +- **`TransactionScope` silently escalating to MSDTC (distributed transaction)**: when more + than one connection (or another resource manager) enlists in the same ambient + `TransactionScope`, it promotes to a **distributed transaction via MSDTC** — a large, + easily-overlooked latency and locking cost, and a frequent prod failure when MSDTC isn't + configured. Flag a `TransactionScope` that opens two `SqlConnection`s (even to the same + server on older clients). (Modern SqlClient supports local→distributed promotion only when + truly needed; keep it to a single connection to stay local.) +- **`NOLOCK` / `READ UNCOMMITTED` used "for performance"**: gives dirty reads, missing/ + duplicated rows, and read-skew — a **correctness hazard, not a perf technique**. Flag its + presence (table hints in raw SQL, `IsolationLevel.ReadUncommitted` scopes); do not + recommend it. The right fix for reader/writer blocking is RCSI, not `NOLOCK`. + +### Dapper +- **Buffered by default**: `Query<T>` materialises the entire result set into a `List<T>` + before returning. For very large streams pass `buffered: false` to stream rows lazily + (lower peak memory; keeps the reader/connection open while enumerating). +- **Parameterise — never concatenate**: pass parameters via anonymous objects / + `DynamicParameters` so commands are parameterised (plan reuse + injection-safe). Flag + interpolated/concatenated SQL passed to Dapper. +- **IN-list expansion**: Dapper expands `IEnumerable<int>` parameters into + `(@p1,@p2,…)` — convenient, but a different collection size yields a different SQL string + and thus a distinct cached plan (same plan-churn caveat as EF). Prefer a TVP for + large/highly-variable sets. +- **`QueryMultiple` for batching**: read several result sets from one command instead of + several separate round-trips; combine with multi-mapping (`splitOn`) to hydrate related + objects in a single query. diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/wcf.md b/.claude/skills/performance-audit/profile-packs/dotnet/wcf.md new file mode 100644 index 00000000..3555fb86 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/wcf.md @@ -0,0 +1,102 @@ +# .NET performance module: WCF (services) +> Load when `System.ServiceModel` (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the WCF (services) lens only. + +## WCF (services) + +> .NET Framework-only. Many enterprise 4.x apps still expose or consume WCF endpoints and the perf +> issues below are routinely missed in audits. (The modern successor is **CoreWCF** on .NET 6+, a +> separate package with the same programming model — note it as the migration target, but the +> conditions here apply to in-Framework WCF.) Cross-reference the **`.NET Framework (4.x timeline)`** +> area of the version index for the throttling-defaults and async-contract "available since" facts. + +### Client: channel / proxy lifecycle +- **`ChannelFactory<T>` (or `ClientBase<T>` proxy) created per call**: constructing a channel + factory parses endpoint config and builds the whole channel stack — expensive to repeat. Cache and + reuse one `ChannelFactory<T>` per (contract, endpoint, binding, credentials) at AppDomain scope and + create lightweight channels from it. Generated `ClientBase<T>` proxies cache the factory + automatically **only** if you avoid the `Binding`-taking constructors and don't touch the public + `ChannelFactory`/`Endpoint`/`ClientCredentials` properties before first use; otherwise caching is + silently disabled. `ClientBase<T>.CacheSetting` (`AlwaysOn`/`Default`/`AlwaysOff`) controls this and + is immutable once the first proxy of that type is created (verify against the currency brief for + your version). +- **Re-doing security negotiation per call**: with message security / federation the initial + handshake is costly; reusing the same proxy/channel amortises it. Look for new-proxy-per-request + patterns on secured endpoints especially. +- **Abort-vs-close on faulted channels**: calling `Close()`/`Dispose()` on a channel in the + **Faulted** state throws `CommunicationObjectFaultedException` (and `using(proxy)` hides this — the + implicit `Dispose` can throw and mask the real exception). Look for a try/`Close`/catch→`Abort` + pattern; raw `using` over a WCF proxy is a smell. A faulted channel must be re-created, not reused. +- **Reusing a channel across threads when not safe / leaking sessions**: datagram (sessionless) + channels are generally callable concurrently, but sessionful channels and any per-channel state are + not freely thread-safe — look for shared mutable proxies under concurrency, and for channels never + closed (leaks a session/instance on the server until idle timeout). + +### Server: throttling, instancing & concurrency +- **`ServiceThrottlingBehavior` on old/low defaults**: pre-4.0 defaults were very low — + `MaxConcurrentCalls=16`, `MaxConcurrentSessions=10`, `MaxConcurrentInstances=26` (flat, not + per-CPU) — and silently cap throughput under load (excess requests queue, then time out). 4.0 + raised them and made them per-processor (≈`16*CPU` calls / `100*CPU` sessions / `116*CPU` + instances); 4.5 carried these higher dynamic defaults. Flag explicit low `maxConcurrentCalls`/ + `maxConcurrentSessions`/`maxConcurrentInstances` values, and self-hosted services on a framework + target old enough to inherit the flat pre-4.0 defaults. Diagnose with the "Percent of Max + Concurrent *" performance counters (verify the exact numbers/applicability against the currency + brief for your version). +- **`InstanceContextMode` mismatched to workload**: `PerSession` (the default for sessionful + bindings) holds a service instance and resources per client for the session lifetime — expensive at + scale and a memory/leak risk for many idle clients; `PerCall` releases the instance after each call + (best for scalability and stateless ops); `Single` shares one instance across all callers (a + serialization bottleneck unless combined with `ConcurrencyMode.Multiple`). Flag `PerSession`/ + `Single` on high-fan-in stateless services. +- **`ConcurrencyMode` bottlenecks**: the default `Single` serialises all calls into one instance — + a throughput wall for `Single`/`PerSession` services; `Multiple` allows concurrent calls but + **requires the operation/shared state to be thread-safe** (look for unsynchronised shared fields); + `Reentrant` is for callback/re-entrant patterns. Mismatched instancing+concurrency is a classic + hidden serialisation point. +- **Sessionful bindings used where not needed**: reliable sessions / security sessions add + per-session setup, state, and keep-alive overhead; if the contract is effectively stateless + request/response, a sessionless binding (or `[ServiceContract(SessionMode=SessionMode.NotAllowed)]`) + removes that cost. + +### Bindings, payloads & serialization +- **Heavier binding than requirements need**: `WSHttpBinding` defaults to message-level security + + WS-* (and supports reliable sessions) — significant per-message crypto/handshake overhead vs + `BasicHttpBinding` (plain SOAP, transport security). For intra-org/back-end calls prefer + `NetTcpBinding` (binary encoding, faster, connection-oriented) or `NetNamedPipeBinding` + (same-machine, lowest overhead). Pick the lightest binding that meets the security/interop/ + transport requirement; flag `WSHttpBinding` with message security + reliable sessions used for + simple internal traffic (verify against the currency brief for your version). +- **Default `TransferMode.Buffered` on large payloads**: buffered mode holds the **entire** message + in memory before send/receive (LOH pressure, latency, OOM risk for large files/blobs) and is bounded + by `maxReceivedMessageSize` (default 65,536 bytes). For large file/stream transfer use + `TransferMode.Streamed` (or `StreamedRequest`/`StreamedResponse`) with an operation that takes/ + returns a single `Stream`; keep a sane `maxReceivedMessageSize` even when streaming (headers are + always buffered — a DoS/OOM vector otherwise). Note streaming is unavailable on MSMQ bindings and + disables features that need the whole message (signatures, reliable sessions). Also review + `readerQuotas` raised blindly to `Int32.MaxValue` — that removes a memory safety bound rather than + fixing a design (verify against the currency brief for your version). +- **`NetDataContractSerializer` in use**: it embeds full CLR type names in the wire payload and is + slower and tightly coupled (and a known deserialization-security risk) — prefer the default + `DataContractSerializer`. With `DataContractSerializer`, member order matters (alphabetical / + explicit `Order=`) and a mismatch forces extra work; `[DataContract(IsReference=true)]` and large + `[KnownType]` sets add graph-tracking and type-resolution cost — flag cyclic/large object graphs and + long `[KnownType]`/`[ServiceKnownType]` lists serialised on hot paths. `[XmlSerializerFormat]` + switches an operation to `XmlSerializer` (needed for precise XML/legacy schema control) but is + slower and carries the `XmlSerializer` per-instance temp-assembly caching gotcha — see the CPU/ + serialization bullets above. + +### Interface shape, async & per-call overhead +- **Chatty service interface**: fine-grained operations (a call per property/row) multiply network + round-trips and per-call serialization/dispatch overhead; an N+1 pattern across service calls (one + coarse call followed by a loop of per-item calls) is the service-tier analogue of EF N+1. Prefer + coarse, DTO-returning operations that batch the data a caller needs in one round-trip. +- **Sync-over-async / blocking inside operations**: blocking on I/O (DB, downstream service, file) in + a service operation ties up a dispatcher/thread-pool thread per concurrent call and, combined with + throttling limits above, caps concurrency. Use `Task`-returning async operation contracts (TAP + server-side support is **4.5+**) for I/O-bound work; avoid `.Result`/`.Wait()` inside operations + (verify against the currency brief for your version). +- **Per-call behaviors / inspectors / metadata overhead**: custom `IDispatchMessageInspector` / + `IParameterInspector` / message-formatter behaviors and verbose message logging run on **every** + message — audit what each call actually executes. Leaving the MEX endpoint and + `serviceMetadata httpGetEnabled` on in production exposes metadata and adds surface; `includeExceptionDetailInFaults` + left enabled is a perf and information-disclosure smell. Flag heavy/duplicated behaviors in the + dispatch path. diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/winforms.md b/.claude/skills/performance-audit/profile-packs/dotnet/winforms.md new file mode 100644 index 00000000..24690d95 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/winforms.md @@ -0,0 +1,69 @@ +# .NET performance module: WinForms +> Load when `System.Windows.Forms` (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the WinForms lens only. + +## WinForms + +> Windows desktop UI on **both** .NET Framework and modern .NET / Windows Desktop +> (`net8.0-windows`+). The performance model — a single STA UI thread pumping a Win32 +> message loop, GDI/GDI+ painting, handle-backed controls — is essentially identical across +> runtimes, so these are *conditions to look for* on any WinForms target unless noted. The +> async/await idioms below are richer on modern .NET; `BackgroundWorker` is the Framework-era +> fallback that still works everywhere. + +- **Long synchronous work on the UI thread**: any blocking I/O, database query, web call, or + heavy computation run directly in an event handler freezes the message pump (the app stops + repainting and responding, shows "Not Responding"). Move it off-thread via `async`/`await` over + truly-async APIs, `Task.Run` for CPU-bound work, or `BackgroundWorker` (Framework-era but + portable); never `.Result`/`.Wait()` it back on the UI thread (sync-over-async deadlocks under + the WinForms `SynchronizationContext`). +- **Cross-thread results marshaled per-item**: UI controls may only be touched on the thread that + created their handle — worker results must come back via `Control.Invoke`/`BeginInvoke` (or + `IProgress<T>`/`await`, which capture the UI context for you). `Invoke` is **synchronous** (blocks + the worker until the UI thread runs the delegate); `BeginInvoke` is async (fire-and-forget). A + per-item `Invoke` inside a tight loop floods the message queue and serializes the worker against + the UI thread — batch results and marshal once per chunk. +- **Bulk list/tree/combo population without batching**: adding many items to `ListView`/`TreeView`/ + `ComboBox`/`ListBox` one-by-one repaints (and re-sorts) per item. Wrap the loop in + `BeginUpdate()`/`EndUpdate()` to suppress repaint, or prefer `AddRange` (which applies internal + batching/optimizations for you). Calls nest: `EndUpdate` must balance every `BeginUpdate`. + (Handle-creation timing also matters: populate a `ListView` *after* its handle exists — e.g. in + `Load`/`Shown` — but a `TreeView` populates fastest *before* handle creation or via `AddRange`.) +- **Bulk layout changes without `SuspendLayout`/`ResumeLayout`**: mutating many child controls' + `Bounds`/`Size`/`Location`/`Visible`/`Text` (especially on `AutoSize` controls, especially in + `Form.Load` where handles already exist) fires a `Layout` event per change. Bracket bulk changes + with `SuspendLayout()`/`ResumeLayout()` — and call them on the **container actually receiving the + children** (e.g. the panel), not the parent form. Note `SuspendLayout` only suppresses the managed + `OnLayout`; it does not stop Win32 size messages, so set the property carrying the most info at + once (`Bounds` over separate `Size`+`Location`). +- **`DataGridView` over large data without VirtualMode**: a `DataGridView` materializing thousands + of rows holds a cell object per cell and is slow to scroll/resize. Set `VirtualMode = true` and + serve cells from your own cache via the `CellValueNeeded` events for very large/just-in-time data + sets; enable double-buffering; avoid recomputing per-cell/per-row styling on every paint (share + `DataGridViewCellStyle` objects, avoid `AutoSizeColumnsMode`/`AutoSizeRowsMode` that re-measure + all rows). Use shared rows where possible. +- **Missing double-buffering on custom/heavy-painted controls**: progressive redraw of a + drawing-intensive surface flickers and feels slow. Enable `DoubleBuffered = true`, or for custom + controls `SetStyle(ControlStyles.OptimizedDoubleBuffer | ControlStyles.AllPaintingInWmPaint, true)` + so painting happens off-screen and blits once. (`DataGridView` exposes double-buffering only via a + protected member / reflection.) +- **Heavy work inside `OnPaint`**: `Paint` fires often; creating `Font`/`Brush`/`Pen` objects, + measuring strings, or doing expensive computation per paint is a hot-path cost. Hoist object + creation to construction/`Resize` and cache it; avoid `TextFormatFlags.WordBreak` on single-line + measurement and use `TextRenderer` overloads that don't take an `IDeviceContext` (they reuse a + cached memory DC). +- **`Application.DoEvents()` misuse**: pumping the message queue mid-operation to "keep the UI + responsive" invites re-entrancy bugs and burns CPU (it busy-pumps). Use async/await or + `BackgroundWorker` with `ProgressChanged` instead of `DoEvents`. +- **Leaked GDI/GDI+ objects and handles**: `Font`, `Brush`, `Pen`, `Bitmap`, `Graphics`, + `Region`, `Icon` wrap native GDI handles — not disposing them leaks handles (the process has a + finite GDI handle quota; exhaustion degrades then breaks rendering). Wrap them in `using`/dispose + deterministically; cache long-lived ones rather than recreating per paint; handle large images + with care (dispose source bitmaps, watch LOH for big `Bitmap` buffers). +- **Event-handler / component leaks keeping forms alive**: subscribing a long-lived publisher to a + form/control handler (timers, static events, parent-to-child wiring) roots the form so it (and its + whole control tree + GDI handles) never collects after close. Unsubscribe on `FormClosed`/`Dispose`; + dispose `Timer`/`BackgroundWorker`/components. +- **Heavy data binding on large/complex bindings**: deep or large `BindingSource`/`DataGridView` + bindings, especially with `IBindingList` change notifications firing per row, can dominate populate + time; suspend binding/notifications during bulk loads (`BindingSource.RaiseListChangedEvents = false`, + or `SuspendBinding`) and resume once. diff --git a/.claude/skills/performance-audit/profile-packs/dotnet/wpf.md b/.claude/skills/performance-audit/profile-packs/dotnet/wpf.md new file mode 100644 index 00000000..bcb0a703 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/dotnet/wpf.md @@ -0,0 +1,101 @@ +# .NET performance module: WPF +> Load when `*.xaml` (etc.) is detected — see the module map in `../dotnet.md`. Core lanes + Variant notes live in `../dotnet.md`; this file is the WPF lens only. + +## WPF + +> WPF runs on **both** .NET Framework and modern .NET / Windows Desktop (`net8.0-windows`+); the +> retained-mode composition model, layout system, binding engine, and `Freezable`/rendering-tier +> behavior are the same across runtimes, so these are *conditions to look for* on any WPF target. +> Microsoft's "Optimizing WPF Application Performance" series is the canonical source for the items +> below. APIs are durable; verify exact members against the currency brief for your version. + +### Collections & virtualization + +- **Large `ItemsControl`/`ListBox`/`DataGrid`/`TreeView` without UI virtualization**: by default the + layout system creates a container for *every* item and measures/arranges it, even off-screen. + UI virtualization defers container generation to visible items only; it is **on by default** for + `ListBox`/`ListView` data-bound, but `TreeView` and custom `ItemsControl`s need it turned on + (`VirtualizingStackPanel.IsVirtualizing="True"`, set `ItemsPanel` to `VirtualizingStackPanel` for + controls like `ComboBox`). Add `VirtualizationMode="Recycling"` to reuse containers instead of + churning them while scrolling (verify against the currency brief for your version). +- **Virtualization silently defeated**: wrapping the items host in a `ScrollViewer`/`StackPanel`, or + placing the list inside an `Auto`-sized / unbounded-height container, gives the panel infinite + available space so it realizes every item; `ScrollViewer.CanContentScroll="False"` (pixel scrolling) + and grouping without `VirtualizingPanel.IsVirtualizingWhenGrouping="True"` also disable it. Confirm + the dedicated scrollbar belongs to the control's own virtualizing panel and isn't bypassed. +- **`ObservableCollection<T>` bulk updates raising `CollectionChanged` per item**: it has no + `AddRange`; adding N items in a loop fires N change notifications, each walking bindings and + re-running layout. Build the data first and assign/replace the collection (or `Reset` once), use a + collection type that supports range operations, or suspend notifications during the load. +- **Binding `IEnumerable` instead of `IList`/`IList<T>` to an `ItemsControl`**: forces WPF to wrap it + in a generated `IList`, an avoidable second object and indexing overhead — bind an `IList<T>` + directly. Prefer `ObservableCollection<T>` over a plain `List<T>` when the UI must reflect + add/remove (a plain list forces full regeneration on change). + +### Binding + +- **Silent binding failures are a real perf cost**: each failed binding walks the visual tree + searching for a source and logs a `System.Windows.Data Error` to the trace output — repeated over + many elements / on every layout this is measurable. Treat trace-window binding errors as bugs to + fix; use `PresentationTraceSources.TraceLevel` to locate noisy ones (verify against the currency + brief for your version). +- **Binding to sources without `INotifyPropertyChanged`**: a plain CLR source forces the engine + through reflection/`TypeDescriptor` to resolve and to *poll* for changes — the costliest path. + Implement `INotifyPropertyChanged` (cheaper) on bound view models; for values that never change, + use `Mode=OneTime` so no change-tracking machinery is set up at all. +- **Converters / `StringFormat` on hot, frequently-updated bindings**: an `IValueConverter` or + `MultiBinding` runs on every update and every re-evaluation; keep them cheap, avoid allocation, and + prefer pre-computed view-model properties for values updated at high frequency. +- **Noisy two-way inputs without throttling**: `UpdateSourceTrigger=PropertyChanged` on a `TextBox` + pushes to the source (and re-runs validation/converters/dependent bindings) on every keystroke; + use `Delay` on the binding, or `UpdateSourceTrigger=LostFocus`, for chatty inputs (verify against + the currency brief for your version). + +### Visual tree & layout + +- **Deep / over-nested visual trees**: layout is a recursive measure+arrange pass whose cost scales + with element count and depth; gratuitous nested panels, redundant `Border`/`Grid` wrappers, and + heavyweight templates multiply per-frame `Measure`/`Arrange` work. Flatten the tree, reduce element + count, and build trees **top-down** (adding a node invalidates its parent and all children, so + bottom-up construction re-validates repeatedly). +- **Wrong panel for the job**: panel cost tracks functionality — `Canvas` is cheapest, `Grid`/ + `StackPanel`/`DockPanel` do more measuring. Don't pay for a `Grid` where a `Canvas` or simple + `StackPanel` suffices; avoid `StackPanel` for large lists (it doesn't virtualize unless it's the + virtualizing variant). +- **Layout-invalidation storms**: animating or repeatedly setting properties flagged + `AffectsMeasure`/`AffectsArrange` (size, margin, alignment) on elements high in the tree forces + whole-subtree relayout each frame; prefer transforms (which don't invalidate layout) over + layout-affecting property changes for movement/scaling. +- **`Visibility.Hidden` vs `Collapsed`**: a `Hidden` element is still measured and arranged (it + occupies layout space, just isn't drawn); use `Collapsed` to remove it from layout entirely when + it shouldn't participate. For frequently toggled large subtrees, collapsing avoids the relayout + cost of an invisible-but-measured tree. + +### Rendering + +- **Unfrozen `Freezable`s on the hot path**: brushes, pens, geometries, transforms, and animations + are `Freezable`s that, while unfrozen, maintain `Changed`-event machinery and cannot be shared + across threads. Call `.Freeze()` on ones that never change — it drops the change-notification + overhead, lowers working set, and makes them thread-safe to create off the UI thread. (Unfrozen + `Freezable` `Changed` handlers also keep listeners alive — a subtle leak; remove the brush from + the property to detach.) +- **Software-rendered effects on subtrees**: `DropShadowEffect`/`BlurEffect` (and other bitmap + effects) are expensive and can force a software/temporary-surface render over a whole subtree; + apply sparingly, scope them tightly, and consider `BitmapCache` (cache the rendered result) for + static decorated content. Set `Brush.Opacity` rather than an element's `UIElement.Opacity` + (element opacity can spawn a temporary surface). +- **Ignoring the render tier**: WPF classifies the GPU into rendering tiers (0 = software, 1/2 = + increasing hardware acceleration); on Tier 0 / RDP / VMs much falls back to the CPU-bound software + rasterizer where fill-rate (overdraw, transparency layering) dominates. Query + `RenderCapability.Tier` and degrade gracefully (drop effects, reduce overdraw) on low tiers; for + bitmaps being animated/scaled, `RenderOptions.SetBitmapScalingMode(..., LowQuality)` trades + resampling quality for frame rate. +- **Large opacity/transform animations over big subtrees & `Dispatcher` flooding**: animating + opacity or transforms over a large visual subtree re-composites a lot of pixels per frame; + similarly, posting high-frequency or low-priority work to the `Dispatcher` (per-tick UI updates, + chatty `BeginInvoke`) starves input/layout. Throttle/coalesce dispatcher work, animate the + smallest possible subtree, and prefer cached or transformed rendering over per-frame relayout. +- **Per-instance resources instead of shared**: defining brushes/geometries in a custom control's + own `ResourceDictionary` allocates a fresh copy per control instance; hoist shared, + performance-intensive resources to `Window`/`Application` level (or the control's default theme) + so instances share them — large working-set savings when many instances exist. diff --git a/.claude/skills/performance-audit/profile-packs/generic-pack.md b/.claude/skills/performance-audit/profile-packs/generic-pack.md new file mode 100644 index 00000000..8efeefd1 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/generic-pack.md @@ -0,0 +1,123 @@ +# Profile Pack: Generic (language-agnostic fallback) + +**Always loaded.** Used alone when no language-specific pack matches, and alongside a matched pack +otherwise. A profile pack specializes the generic performance lanes with stack-specific signals so a +lane agent knows what to look for in *this* ecosystem. + +**Packs encode durable, version-independent idioms only.** Volatile, version-specific guidance lives +in the currency brief (see `currency-protocol.md`), never here. Where a pack names a concrete API or +default, it MUST add "verify against the currency brief for your version" so an aging claim doesn't +silently mislead. + +The dispatcher pastes the slice for each lane into that lane's agent. Sections are keyed by lane. + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) +- Nested loops over inputs that grow with load (accidental O(n²)): membership tests, de-dup, joins + done by scanning. +- Repeated/recomputed work inside loops that could be hoisted or memoized. +- Wrong container for the access pattern: linear scan where a hash/set lookup fits; list where a + queue/deque fits; re-sorting already-sorted data. +- Recomputing pure results instead of caching them. + +## Memory & allocation (lane `memory`) +- Allocation on hot paths; building large intermediate collections that are immediately discarded. +- Copies where a view/slice/reference would do. +- Unbounded growth: caches without eviction, accumulating buffers, retained references that prevent + reclamation. +- Reading a whole resource into memory when streaming would bound it. + +## Data access & I/O (lane `data-access`) +- N+1 access: one query/request per item in a loop instead of one batched call. +- Missing pagination/batching; fetching more columns/fields/rows than used (over-fetching). +- Synchronous/blocking I/O on a hot or latency-sensitive path. +- Chatty round-trips that could be coalesced; missing connection pooling/reuse. +- Serialization/deserialization overhead; missing or misused caching layers (cache that never hits, + or is bypassed). +- Query shapes implying a missing index (filtering/sorting on unindexed fields). + +## Concurrency & parallelization (lane `concurrency`) +- **Exploit:** serial loops over independent work; sequential waits on independent async operations + that could run concurrently; missing pipelining/streaming between producer and consumer. + *Before suggesting parallelization, verify the work is actually independent (no shared mutable + state, no ordering dependency) and attach a correctness guard.* +- **Defend:** lock contention; critical sections larger than necessary; blocking calls inside async + contexts; false sharing; thread/connection pool exhaustion. + +## Framework-idiom currency (lane `idiom-currency`) +- Consult the currency brief. Flag patterns the brief marks superseded/deprecated; flag fast-path + APIs the brief lists that the code doesn't use; flag changed defaults the code still fights. +- Offline (no brief): note candidate idiom concerns at LOW confidence, flagged for manual currency check. + +## Payload / startup / build (lane `payload-startup`, conditional) +- Shipping more than needed to the consumer (large payloads, unused data, no compression). +- Expensive work at startup/cold-start that could be lazy or cached. +- Eager initialization of rarely-used components. + +--- + +## How to add a profile pack (for future ecosystems) + +1. Create `profile-packs/<ecosystem>.md` with the **same lane headings** as this file (`algorithmic`, + `memory`, `data-access`, `concurrency`, `idiom-currency`, plus `payload-startup` where the + ecosystem has such a surface). +2. Under each lane, list the ecosystem's *durable* performance signals — the idioms and footguns that + are true across versions. **Size contract (avoid overload):** ~5–9 high-signal, ecosystem-specific + bullets per lane section, each phrased as a *condition to look for* (not a tip or tutorial). Do NOT + restate the generic bullets above — a pack SPECIALIZES. The per-lane slice is pasted into a lane + agent's prompt, so density matters: an over-long lens becomes a checklist the agent walks and pads + to "cover", which fights calibration. A mediocre bullet is worse than an omitted one. + **One point per bullet, tight.** Length is justified only by *reasoning* (the trade-off, the + judgment a strong reader needs), never by *enumeration* — do not staple several distinct footguns + into one bullet (split or cut them). A bullet that lists five sub-conditions has become a checklist; + a bullet that explains one condition and when it does/doesn't matter is a reference. Prefer the + latter. +3. For any concrete API/default you name, append "(verify against the currency brief for your version)". + Do NOT bake version-specific claims into the pack — durable idioms here; version-pinned fast-paths + go in a `version-indexes/<ecosystem>.md` lookup (see `../version-indexes/README.md`); live recency + goes in the currency brief. +4. Register the manifest signatures that select this pack in `SKILL.md` Phase 0 (detection). +5. If the ecosystem has distinct major variants with different perf models (e.g., legacy vs modern + runtime), give each its own clearly-separated subsection. +6. **Framework / sub-stack modules (for large ecosystems).** When an ecosystem accretes many + *tech-specific* lenses (web framework, ORM, desktop UI, RPC, caching, interop) that only apply when + that technology is present, keep `<ecosystem>.md` as the **core** (lanes + a runtime-notes section) + and move each tech lens into `profile-packs/<ecosystem>/<module>.md`. The core pack then carries a + **`## Framework / sub-stack modules (load on detection)`** map — a table of `detection signals → + module file`. The runner loads the core pack for every project of that ecosystem and additionally + loads only the modules whose signals appear in scope, so a run pastes only the relevant lenses + instead of one monolith. (`.NET` is the reference: core `dotnet.md` + `dotnet/{aspnet-core, blazor, + wcf, sql-server-data, winforms, wpf, caching, dependency-injection, interop}.md`.) Each module is a + standalone `# <Ecosystem> performance module: <Tech>` doc that pairs with the core pack. Two ways to + arrive there, same end state: **"relocate"** when the core already carries inline framework-specific + bloat (move it out + deepen — `.NET`, JS/TS), **"deepen"** when the core is already clean and + language-level (keep it as always-loaded quick-hits, add deeper modules — Python, Go). Either way: + core = always-loaded lanes + a **runtime-notes section** (the durable engine/runtime realities that + cut across every lane); modules = load-on-detection depth. The heading is the *same role under + different names*: `## Runtime notes` in Go/Python/JS-TS, `## Variant notes` in `.NET` (its + Modern-vs-Framework split — the original name), `## Reading the plan & schema` in SQL. **Materiality, not mere presence, decides a load** (see `SKILL.md` Phase 0): a module loads + when its technology is *central* to the scope, not on an incidental/transitive import. + +--- + +## The packs are REFERENCES, not checklists — a floor, not a ceiling + +This is a design invariant, not a style note. A pack exists to help an agent *recognize patterns +faster and reason about trade-offs* — it is a prior, not a worklist, and the consumer-side framing in +`lane-prompts.md` says so to every lane agent. Keep the producer side honest too: + +- **Never imply completeness.** A pack names what is *known to be worth knowing*; it is never the + boundary of what is worth finding. A finding the lens didn't list is the goal, not an exception. +- **Write for a reader who may be smarter than the author.** As models strengthen they need *less* + hand-holding on durable fundamentals (they already know them) — so the durable pack is the **most + skippable** layer for a strong model, and it must degrade gracefully: a stronger agent should lean + on it lightly and out-reason it where it can, never be boxed in by it. Do not encode "do exactly + this" prescriptions that a better judgment would override; encode the *condition* and the *trade-off* + and let the agent decide. +- **The unknowable-facts layers age better than the pack.** The three-tier split is deliberate: the + **version index** (post-training, version-pinned fast-paths) and the **currency brief** (post-cutoff + recency) carry what *no* model can self-supply, while the durable pack carries what a capable model + largely already knows. As models improve, weight shifts from the pack toward the index/brief — which + is why version-specific claims must live there, not be baked into the pack. Keeping the pack durable + and lean is itself the future-proofing. diff --git a/.claude/skills/performance-audit/profile-packs/go.md b/.claude/skills/performance-audit/profile-packs/go.md new file mode 100644 index 00000000..10829f79 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/go.md @@ -0,0 +1,144 @@ +# Profile Pack: Go + +Go-specific performance signals for the audit lanes. Use alongside `generic-pack.md`, which covers +language-agnostic patterns; this pack sharpens each lane for Go idioms and footguns. + +This is the **core** Go pack (lanes + Runtime & GC notes). Tech-specific lenses (HTTP servers, +databases, gRPC, serialization, caching, messaging) live in load-on-detection modules under +`profile-packs/go/` — see **`## Framework / sub-stack modules`** at the bottom. Load the core for +every Go project; add a module only when its signals appear in scope. + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) +- Linear membership test inside a loop (`for _, v := range slice { if v == x }`) where the slice can grow — replace with a `map` lookup; maps have O(1) average lookup vs O(n) linear scan. +- Using `map[K]bool` (or `map[K]struct{}`) as a set when keys are dense integers or sequential IDs — a plain `[]bool` or `[]int` indexed by the key is faster and uses far less memory (maps carry ~100 bytes of overhead per entry). +- Calling `regexp.MatchString` or `regexp.Compile` inside a loop — compile the pattern once at package scope or in a `sync.Once` and reuse the `*regexp.Regexp`. +- Re-sorting a slice on every iteration when incremental insertion into a sorted structure (e.g., `sort.Search` + insert) or a heap (`container/heap`) would maintain order at lower cost. +- Recomputing derived values (hash, length, formatted string) on every iteration rather than computing once before the loop or storing alongside the source data. +- Using `map[string]map[string]T` (nested maps) when a single map with a struct key (`map[Key]T`) is clearer and cheaper — struct keys avoid two hash operations and two allocations per access. +- String built with `+=` in a loop is O(n²) in allocation and copying; use `strings.Builder` (with `Grow` to pre-size) or `bytes.Buffer` for multi-step construction. + +## Memory & allocation (lane `memory`) +- Interface boxing on hot paths: passing a concrete value where an `interface` is expected forces heap escape; store the concrete type and pass the interface only at the call boundary, or restructure to avoid the interface on the critical path. +- `[]byte` ↔ `string` conversions that force a copy; in read-only contexts the `unsafe` package exposes zero-copy conversions — use only after profiling confirms the cost, and tag with `(verify against the currency brief for your version)`. +- Slice growth without preallocated capacity: `append` into a nil or empty slice causes repeated doublings; use `make([]T, 0, n)` when n is known or estimable. +- Retaining a large backing array via a small sub-slice (e.g., returning `bigSlice[2:4]` from a function) — the full array cannot be GC'd; copy the needed portion: `out := make([]T, len(sub)); copy(out, sub)`. +- Missing `sync.Pool` (verify against the currency brief for your version) for reusable short-lived buffers (e.g., `bytes.Buffer` for serialization scratch space); always call `Reset()` on retrieval — pool items are cleared by the GC without notice, so `New` must supply a valid zero-value object. +- `defer` inside a tight inner loop: each `defer` records a stack entry that runs at function return, not loop exit; restructure the loop body into a helper function or remove the defer. +- High pointer density in frequently allocated structs: the GC must trace every pointer in the live heap; prefer index-based linking (`NextIdx int`) over pointer chaining (`Next *Node`) in hot allocation paths; the GC stops scanning a struct at its last pointer field, so place non-pointer fields at the end. +- Combining small related allocations into a single struct value rather than separate `new` calls (e.g., embedding a `[16]byte` array and using `buf = arr[:0]` avoids a second allocation for the backing array). + +## Data access & I/O (lane `data-access`) +- Per-row DB queries inside a loop (N+1 pattern) — prefer batch queries, `IN (...)` clauses, or multi-row inserts; N round-trips dominate latency regardless of query speed. +- Missing prepared statements for queries executed in tight loops or under concurrent load — repeated parse/plan overhead accumulates (verify against the currency brief for your version). +- Unbuffered `io.Reader`/`io.Writer` on file or network I/O: each small `Read`/`Write` becomes a syscall; wrap with `bufio.Reader`/`bufio.Writer` (default 4 KB buffer) or use `bufio.Scanner` for line-oriented input (verify against the currency brief for your version). +- Forgetting `bufio.Writer.Flush()` — buffered writes are silently dropped if the writer is not flushed before close. +- `json.Marshal`/`json.Unmarshal` on hot paths: both allocate and use reflection; for streaming HTTP responses prefer `json.NewEncoder(w).Encode(v)` (writes directly to the `ResponseWriter`); for ingest prefer `json.NewDecoder(r).Decode(&v)`; for highest throughput consider code-generated marshalers (verify against the currency brief for your version). +- Unmarshaling into `map[string]any` or `any` instead of a concrete struct — forces full reflection on every field and prevents compiler optimizations. +- Missing or misconfigured connection pool settings (`MaxOpenConns`, `MaxIdleConns`, `ConnMaxLifetime`) leading to either exhaustion under load or idle connection churn (verify against the currency brief for your version). +- `SELECT *` or reading an entire response body when only a subset of fields/bytes is needed — over-fetch inflates network I/O, deserialization work, and GC pressure. + +## Concurrency & parallelization (lane `concurrency`) +- Goroutine leaks: goroutines launched without a `context.Context` cancellation path (or a `done` channel) accumulate silently — each retains at least a 2–8 KB stack that grows on demand; always `defer cancel()` when creating a context and propagate cancellation down the call chain. +- Unbounded goroutine spawn (`go f()` inside a loop with no cap) — use `errgroup.Group` with `g.SetLimit(n)` (verify against the currency brief for your version) or a fixed worker pool receiving from a channel; unbounded spawn exhausts memory under load. +- `sync.Mutex` critical sections that span I/O or computation: hold the lock only around the shared-state read/write, not around the work that produced the value; consider `sync.RWMutex` for read-heavy workloads. +- Single shared channel used as a global bottleneck — the channel serializes all senders/receivers; consider sharding across N channels or switching to a worker-pool pattern when profiling shows channel contention. +- Shared mutable buffers accessed by multiple goroutines (e.g., a package-level `[N]byte` used as a scratch buffer in concurrent `ReadFrom` calls) — give each goroutine its own buffer or use `sync.Pool`. +- Independent sub-tasks executed serially that could be fanned out — use `errgroup.WithContext` so the first error cancels remaining work; verify tasks have no shared mutable state and no ordering dependency before parallelizing. +- Goroutines in `syscall` state consume OS threads (M in the scheduler); goroutines blocked on Go channels do not — distinguish blocking profiles (`runtime.SetBlockProfileRate`) from `GODEBUG=schedtrace` output to identify the correct fix. +- `time.After(d)` inside a long-lived `for { select {...} }` loop: each iteration allocates a `*time.Timer` that (on older runtimes) is not reclaimed until `d` fires, so a hot loop where another case keeps firing leaks timers for the whole duration — prefer one reusable `time.NewTimer`/`time.NewTicker` with `Reset`, or a `context` deadline; the leak is reduced on newer runtimes but the reusable-timer idiom is still the durable fix (verify against the currency brief for your version). + +## Framework-idiom currency (lane `idiom-currency`) +- Consult the currency brief/index for the framework in use (stdlib `net/http`, gRPC, Gin, Echo, etc.). Flag superseded middleware patterns, changed default timeouts or buffer sizes, and fast-path APIs the code bypasses (verify against the currency brief for your version). +- Offline (no brief): note candidate idiom concerns at LOW confidence, flagged for manual currency check. + +## Payload / startup / build (lane `payload-startup`) +- Heavy work in `init()` functions (file I/O, network calls, large allocations, regexp compilation) runs before `main` and inflates cold-start time; prefer lazy initialization via `sync.Once` or explicit setup calls. +- Large numbers of `init()` registrations or eagerly constructed global singletons add latency on every cold start in serverless or container environments — sequence matters; profile with `GODEBUG=inittrace=1` (verify against the currency brief for your version). +- Eager construction of rarely-used subsystems at startup (opening DB connections, loading remote config) instead of on first use — use `sync.Once`-guarded lazy init. +- `runtime.SetFinalizer` on hot-path objects: finalized objects survive their first GC cycle and delay reclamation; chains of finalized objects require N GC cycles to free; prefer explicit `Close()` methods or `runtime.AddCleanup` (verify against the currency brief for your version). +- Shipping debug symbols or enabling CGo dependencies that are not needed bloats binary size and cold-start time; verify build flags strip appropriately (`-ldflags="-s -w"`) (verify against the currency brief for your version). + +--- + +## Runtime & GC notes (load for every Go project) + +Go has no legacy-vs-modern runtime split the way some ecosystems do — every Go program shares one +runtime whose garbage collector, scheduler, and build pipeline expose durable tuning levers. These +cut across all the lanes above (and every module below); treat them as the Go analog of a "variant +notes" section. They are *how the runtime is configured and measured*, not code-pattern signals. + +- **`GOMAXPROCS` unaware of the container CPU limit**: the runtime historically sets `GOMAXPROCS` to + the number of host logical CPUs, which in a CPU-limited container (Kubernetes `limits.cpu`, cgroup + quota) over-provisions the scheduler — too many runnable Ps cause CPU throttling, scheduling + latency, and GC-assist contention. Look for the absence of `go.uber.org/automaxprocs` (or an + explicit `runtime.GOMAXPROCS` set from the cgroup quota) in containerized services; newer Go + runtimes are becoming cgroup-aware, so confirm the behavior for the toolchain in use (verify + against the currency brief for your version). +- **GC tuning levers left at defaults for the workload**: `GOGC` (default 100 — collect when the + heap doubles) trades GC CPU for memory; raising it reduces GC frequency for throughput-bound, + memory-rich services, lowering it caps memory at higher GC cost. `GOMEMLIMIT` is a *soft* heap + ceiling the GC respects even with `GOGC=off` — essential for memory-capped containers to avoid OOM + kills; leave 5–10% headroom below the container limit and pair with a higher `GOGC` (verify against + the currency brief for your version). Flag services that fight OOM kills or GC-thrash with neither + knob set. +- **cgo on a hot path**: every `cgo` call crosses a boundary that pins the calling goroutine to its + OS thread for the call, cannot be inlined, blocks escape analysis across the boundary, and adds + fixed per-call overhead; a `cgo` call in a tight loop or per-request path is a recurring footgun. + Prefer a pure-Go implementation where one exists; batch work across the boundary when cgo is + unavoidable; check whether `CGO_ENABLED=0` is viable (also smaller, faster-starting static + binaries) (verify against the currency brief for your version). +- **Optimizing without a profile, or shipping without PGO**: Go ships first-class profiling — flag + changes justified by intuition rather than `pprof` (CPU/heap/block/mutex) or + `go test -bench -benchmem`. For CPU-bound services, **Profile-Guided Optimization** (commit a + representative `default.pgo` next to `main`) lets the compiler inline and devirtualize hot calls + for a few percent throughput at no code cost — its absence on a hot service is a missed lever + (verify against the currency brief for your version). +- **Avoidable heap escapes the compiler will show you**: `go build -gcflags='-m'` reports which + values escape to the heap (returned pointers, values stored behind an interface, closures captured + by reference, slices whose size the compiler can't bound). Escapes on hot paths drive GC work; + the escape report and inlining decisions (`-m -m`) are the durable way to confirm a suspected + allocation rather than guessing (cross-reference the **Memory & allocation** lane above). + +## Framework / sub-stack modules (load on detection) + +Load the core lanes + **Runtime & GC notes** above for *every* Go project. Additionally load the +matching module when its technology is detected in the audit scope, and include it as ecosystem +context in the relevant lane prompts. (These tech-specific lenses are split out so a run pastes only +what's relevant — see the version index `../version-indexes/go.md` for version-specific facts.) + +| Detected (signals) | Load module | +|---|---| +| **HTTP servers & web frameworks** — `net/http` servers, `github.com/gin-gonic/gin`, `github.com/labstack/echo`, `github.com/gofiber/fiber`, `github.com/go-chi/chi` | [`go/net-http-servers.md`](go/net-http-servers.md) | +| **Database access** — `database/sql`, `github.com/jackc/pgx`, `gorm.io/gorm`, `github.com/jmoiron/sqlx`, `sqlc`, `github.com/lib/pq` | [`go/database-sql.md`](go/database-sql.md) | +| **gRPC** — `google.golang.org/grpc`, `google.golang.org/protobuf` (`.proto` / `*.pb.go`) | [`go/grpc.md`](go/grpc.md) | +| **Serialization** — `encoding/json`, `google.golang.org/protobuf`, `github.com/json-iterator/go`, `github.com/mailru/easyjson`, `github.com/goccy/go-json`, `github.com/vmihailenco/msgpack` | [`go/serialization.md`](go/serialization.md) | +| **Caching** — `github.com/dgraph-io/ristretto`, `github.com/allegro/bigcache`, `github.com/coocood/freecache`, `github.com/patrickmn/go-cache`, `github.com/redis/go-redis`, `golang.org/x/sync/singleflight` | [`go/caching.md`](go/caching.md) | +| **Messaging & streaming** — `github.com/segmentio/kafka-go`, `github.com/IBM/sarama`, `github.com/confluentinc/confluent-kafka-go`, `github.com/nats-io/nats.go`, `github.com/rabbitmq/amqp091-go`, `cloud.google.com/go/pubsub` | [`go/messaging.md`](go/messaging.md) | + +--- + +## Sources + +Durable signals in this pack are grounded in these authoritative sources (version-specific facts and +their per-entry citations live in `../version-indexes/go.md`): + +- go.dev — blog/pprof, wiki/Performance, blog/slices-intro, blog/strings, doc/effective_go, doc/gc-guide +- pkg.go.dev — `sync.Pool`, `strings.Builder`, `bufio`, `encoding/json`, `golang.org/x/sync/errgroup` +- **Runtime & GC** — go.dev/doc/gc-guide (`GOGC`/`GOMEMLIMIT`), go.dev/blog/pgo, `runtime.GOMAXPROCS` docs, cgo command docs, `go build -gcflags=-m` (escape analysis), `go.uber.org/automaxprocs`. + +**Sub-stack modules** carry their own grounding; key sources per module: + +- **HTTP servers** (`go/net-http-servers.md`) — `net/http` `Server`/`Transport` docs, gin/echo/chi + routing docs, gofiber/fasthttp context-reuse caveats. +- **Database access** (`go/database-sql.md`) — `database/sql` (`SetMaxOpenConns` etc., `Rows`), + pgx/`pgxpool` (`Batch`, `CopyFrom`), GORM performance docs (`Preload`/`Joins`/`Select`). +- **gRPC** (`go/grpc.md`) — grpc-go docs (`ClientConn`, `MaxRecvMsgSize`, `keepalive`, + load-balancing/resolver), protobuf Go API. +- **Serialization** (`go/serialization.md`) — `encoding/json` (`Encoder`/`Decoder`, `RawMessage`, + `UseNumber`), protobuf Go API, easyjson/goccy-go-json/jsoniter/msgpack READMEs. +- **Caching** (`go/caching.md`) — `golang.org/x/sync/singleflight`, ristretto/bigcache/freecache + READMEs, `sync.Map` docs, redis/go-redis (pooling, `Pipelined`). +- **Messaging** (`go/messaging.md`) — segmentio/kafka-go, IBM/sarama, confluent-kafka-go, nats.go + (JetStream), rabbitmq/amqp091-go (`Qos`/`Channel` thread-safety), cloud.google.com/go/pubsub. diff --git a/.claude/skills/performance-audit/profile-packs/go/caching.md b/.claude/skills/performance-audit/profile-packs/go/caching.md new file mode 100644 index 00000000..7e1a084d --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/go/caching.md @@ -0,0 +1,61 @@ +# Go performance module: Caching (in-process: ristretto/bigcache/freecache — distributed: go-redis/memcache) +> Load when `github.com/dgraph-io/ristretto`, `github.com/allegro/bigcache`, `github.com/coocood/freecache`, `github.com/patrickmn/go-cache`, `github.com/redis/go-redis`, `github.com/bradfitz/gomemcache`, or `golang.org/x/sync/singleflight` is detected — see the module map in `../go.md`. Core lanes + Runtime & GC notes live in `../go.md`; this file is the Caching lens only. + +## Caching (in-process: ristretto/bigcache/freecache — distributed: go-redis/memcache) + +> Scope: in-process caches (ristretto, bigcache, freecache, go-cache, sync.Map) and distributed +> caches (go-redis, gomemcache). The recurring themes are **bounded eviction** (cap memory before +> the process OOMs), **stampede control** (single-flight prevents goroutine pile-ons at miss time), +> **GC-friendly storage** for huge caches (off-heap byte slices vs pointer-rich maps), **connection +> reuse** (one long-lived client, not one per request), and **batching** (pipelines/MGET instead +> of serial round-trips). Bullets are *conditions to look for*. + +- **Cache stampede / thundering herd on a hot miss**: on a cache miss, many goroutines launching + the same expensive fetch or computation concurrently — wrap the fill with + `golang.org/x/sync/singleflight` (`Group.Do` / `Group.DoChan`) so exactly one call executes per + key and all waiters share its result; this is especially critical at startup or after a TTL + expiry wave (verify against the currency brief for your version). + +- **Unbounded in-process cache → memory growth / OOM**: a bare `map` or `sync.Map` used as a + cache with no eviction policy and no size cap grows without bound; replace with a cache that + enforces limits — ristretto's cost-based admission/TinyLFU eviction, or freecache/bigcache's + fixed-size ring-buffer — rather than a hand-rolled map; cross-reference the core **Memory & + allocation** lane and the **payload/startup** notes for init-time allocation cost. + +- **GC pressure from a huge pointer-rich in-process cache**: Go's GC scans every pointer in the + live heap, so a cache holding millions of entries backed by pointers (e.g., `map[string]*T`) + lengthens stop-the-world and concurrent mark phases; **bigcache and freecache store entries as + `[]byte` serialized off the GC's pointer-scanning path** specifically to avoid this overhead — + prefer them when the working set is very large (cross-reference Runtime & GC notes in + `../go.md`). + +- **`sync.Map` misuse for a balanced-read/write or high-churn cache**: `sync.Map` is optimized + for **read-heavy / write-once** (or disjoint-key) workloads; using it for a cache with frequent + updates or high key churn is slower than a sharded `map`+`sync.RWMutex` because dirty-map + promotions and key-set rebuilds dominate; match the structure to the measured access pattern + (verify against the currency brief for your version). + +- **go-redis `Client`/`ClusterClient` created per request**: a `redis.Client` is itself a + connection pool and is designed to be **long-lived and shared**; constructing one per request + or per goroutine exhausts file descriptors and TCP connections; create a single client at + startup, tune `PoolSize`, `MinIdleConns`, `DialTimeout`, `ReadTimeout`, and `WriteTimeout` for + the expected concurrency, and inject it as a dependency (verify against the currency brief for + your version). + +- **Redis serial round-trips instead of pipelining or multi-key commands**: issuing many + sequential `Get`/`Set` calls each pays a full network round-trip; use `client.Pipelined` (or + `client.Pipeline()`) to batch commands, `MGET`/`MSET` for bulk key access, and `TxPipelined` + where atomicity is needed; N serial round-trips dominate latency even on a local Redis instance + (cross-reference the core **Data access & I/O** lane). + +- **Over-large or serialization-heavy cache values in Redis**: caching large serialized blobs + inflates network bandwidth, (de)serialization CPU, and Redis memory on every cache hit; right- + size cached values to what callers actually consume, avoid caching entire documents when a + projection suffices, and consider compression only after measuring that it pays; also avoid + caching values cheaper to recompute than to fetch and deserialize. + +- **TTL & invalidation gaps causing stale growth or expiry-wave stampedes**: no TTL on cache + entries produces indefinite stale growth; identical TTLs on a large batch of keys causes a + synchronized expiry spike and a recompute stampede — add per-key jitter (e.g., base TTL ± + random fraction); also apply **negative caching** (caching a sentinel for missing keys) to + prevent repeated backend misses for non-existent entries that are queried at high rate. diff --git a/.claude/skills/performance-audit/profile-packs/go/database-sql.md b/.claude/skills/performance-audit/profile-packs/go/database-sql.md new file mode 100644 index 00000000..b43a81f4 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/go/database-sql.md @@ -0,0 +1,86 @@ +# Go performance module: Database access (database/sql / pgx / GORM / sqlx / sqlc) +> Load when `database/sql`, `github.com/jackc/pgx`, `gorm.io/gorm`, `github.com/jmoiron/sqlx`, +> `sqlc`, or `github.com/lib/pq` is detected — see the module map in `../go.md`. Core lanes + +> Runtime & GC notes live in `../go.md`; this file is the Database access lens only. + +## Database access (database/sql / pgx / GORM / sqlx / sqlc) + +> Scope: all patterns that touch `*sql.DB`, `pgxpool.Pool`, GORM's `*gorm.DB`, sqlx's `*sqlx.DB`, +> or the generated code from sqlc. The recurring themes are: **pool reuse** (the pool is the unit of +> connection management — open it once, share it everywhere), **batching to cut round-trips** (N+1 +> is the dominant latency killer), **scanning only what's needed** (over-fetch inflates I/O and GC +> pressure), and **context cancellation** (every query should be cancellable so a dropped client +> doesn't hold a DB connection open). Bullets are *conditions to look for*; cross-reference the +> core **Data access & I/O** lane for the generic analogues and the **Concurrency** lane for +> pool-exhaustion and goroutine-leak interactions. + +- **`*sql.DB` opened per request instead of shared**: `*sql.DB` is a goroutine-safe connection pool + meant to be constructed once at startup and shared across the application for its lifetime. + Opening a new `sql.Open` (or `pgxpool.New`) per request or per handler bypasses the pool + entirely, pays connection-establishment overhead on every call, and leaks file descriptors if + `Close` is forgotten (cross-reference the **Concurrency** lane: each leaked connection holds an + OS-level socket and a goroutine waiting on it). + +- **Pool defaults left unconfigured — exhaustion or idle-churn**: `*sql.DB` defaults leave + `MaxOpenConns` unlimited (runaway connection count under burst load) and `MaxIdleConns` at a + small value (idle connections closed and re-opened on the next request, incurring TCP + TLS + + auth overhead). Look for missing calls to `SetMaxOpenConns`, `SetMaxIdleConns`, + `SetConnMaxLifetime`, and `SetConnMaxIdleTime`; through a proxy or PgBouncer, stale conns with + no lifetime cap cause silent errors. Set all four explicitly for any production workload + (verify against the currency brief for your version). + +- **N+1 queries — per-row `Query` inside a `range` loop**: issuing a separate `db.QueryContext` + per item (e.g., loading each user's profile inside a `for _, id := range ids` loop) multiplies + round-trips linearly with the result set. Replace with a single batched query (`WHERE id = + ANY($1)` with a `pgtype`/pq array arg, or `IN (...)`) for reads; use `pgx.Batch` for + heterogeneous statements; use `pgx.CopyFrom` for bulk inserts (cross-reference the **Data + access & I/O** lane N+1 bullet). With GORM, look for `Find` or `First` inside a loop and for + missing `Preload` on associations that trigger a query per parent row. + +- **`rows.Close()` not deferred — connection leak under errors**: a `*sql.Rows` holds its + underlying connection until `Close` is called. If the calling code returns early on an error + without closing (or without fully iterating to `io.EOF`), that connection is stuck until the + `ConnMaxLifetime` expires or the pool is exhausted. Always `defer rows.Close()` immediately + after checking the `Query` error, and always check `rows.Err()` after the iteration loop — an + interrupted scan leaves `rows.Err()` set. The same applies to `pgx.Rows` (verify against the + currency brief for your version). + +- **Queries without context — uncancellable DB work**: `db.Query` / `db.Exec` without a context + keep the query running on the server even after the HTTP handler's `ResponseWriter` has + returned, the client has disconnected, or the service is shutting down. Prefer + `db.QueryContext(ctx, ...)` and `db.ExecContext(ctx, ...)` threaded from the request context + (`r.Context()` or a derived context with a deadline), so the DB driver can cancel the in-flight + statement when the context is cancelled (cross-reference the **Concurrency** lane: context + propagation is the canonical Go cancellation contract). + +- **GORM over-fetch and missing `Select` / `Preload` vs `Joins` confusion**: GORM's `Find` with + no `Select` fetches all columns, inflating I/O and scan work on wide tables. `Preload` issues a + *second* query for each association (one `IN (...)` per level), which compounds to N+1 across + nested or repeated associations; `Joins` folds the association into a single SQL `JOIN` but + returns only the root model columns unless `Select` is explicit. GORM also runs hooks and does + reflection per row — on paths called at high QPS, switch to raw `database/sql`/pgx or sqlc- + generated code (verify against the currency brief for your version). + +- **Prepared statement churn vs reuse**: `db.QueryContext` re-parses and re-plans the query on + every call in many drivers. For queries executed at high frequency, `db.PrepareContext` amortises + the parse/plan cost — but with `database/sql`, a `*sql.Stmt` is re-prepared on each connection + in the pool transparently, so pool size × prepare overhead matters. With pgx native (`pgxpool`), + the extended query protocol and statement cache differ; understand the cache-hit behaviour before + assuming prepare is free. sqlc-generated code uses `$N` placeholders and pairs well with pgx + statement caching (verify against the currency brief for your version). + +- **`lib/pq` instead of pgx on Postgres — missing binary protocol and batch support**: `lib/pq` + is in maintenance mode and uses the text wire protocol; `github.com/jackc/pgx` uses the binary + protocol (no text encode/decode round-trip for numerics, timestamps, UUIDs), supports `pgx.Batch` + for sending multiple statements in a single round-trip, and `pgx.CopyFrom` for high-throughput + bulk inserts. For greenfield Postgres work or any hot path, prefer pgx native (`pgxpool.Pool`) + or the pgx `database/sql` adapter; audit remaining `lib/pq` imports as candidates for migration + (verify against the currency brief for your version). + +- **Transactions held open across network I/O or user latency**: a `*sql.Tx` (or `pgx.Tx`) holds + one connection from the pool for its entire duration and acquires row locks on the database. + Long-held transactions caused by performing HTTP calls, user prompts, or unbounded computation + between `BeginTx` and `Commit`/`Rollback` drain the pool (cross-reference the **Concurrency** + lane: pool exhaustion manifests as goroutines blocked on `db.BeginTx`). Look for transactions + that span more than pure DB work, and for missing `defer tx.Rollback()` guards that leave + transactions uncommitted on error paths (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/go/grpc.md b/.claude/skills/performance-audit/profile-packs/go/grpc.md new file mode 100644 index 00000000..738cb2cc --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/go/grpc.md @@ -0,0 +1,87 @@ +# Go performance module: gRPC (grpc-go / protobuf) +> Load when `google.golang.org/grpc` or `google.golang.org/protobuf` (`.proto` files / generated `*.pb.go`) is detected — see the module map in `../go.md`. Core lanes + Runtime & GC notes live in `../go.md`; this file is the gRPC lens only. + +## gRPC (grpc-go / protobuf) + +> Covers **grpc-go** (`google.golang.org/grpc`), its protobuf runtime +> (`google.golang.org/protobuf`), and **connect-go** as a sibling transport where relevant. +> Bullets are *conditions to look for*. The recurring themes are `ClientConn` reuse across +> calls, streaming or batch RPCs to cut per-call round-trips, message sizing relative to +> compression and the default receive limit, and per-RPC deadline discipline. + +- **`grpc.ClientConn` created per-call or per-request**: a `ClientConn` negotiates TLS, runs + HTTP/2 connection setup, starts background health-check and keepalive goroutines, and + multiplexes many concurrent RPCs on a single TCP connection — it is expensive to establish + and fully goroutine-safe. Creating one per RPC (or per inbound request) serializes connection + setup, blows the goroutine budget, and prevents HTTP/2 multiplexing gains. Reuse a + long-lived singleton (or a small keyed pool for distinct targets) and let the channel + manage its own subchannels (verify against the currency brief for your version). + +- **Unary RPC in a loop instead of streaming or a batch message**: calling a unary RPC once + per item pays per-call framing, header compression, and a full round-trip each iteration. + Use **client/server streaming** (or a repeated-field batch request) to amortize that cost — + the stream establishes call state once and pipelines messages without re-incurring the RPC + handshake per item. In connect-go, the same tradeoff applies via `Connect` streaming + handlers (verify against the currency brief for your version). + +- **Single `ClientConn` behind an L4 load balancer with no resolver/balancer configured**: a + single HTTP/2 connection pins all RPCs to one backend TCP connection, bypassing the LB + entirely — all traffic lands on one server. Configure a proper gRPC resolver and a + client-side balancer (e.g., `roundrobin` via `grpc.WithDefaultServiceConfig`) so each + subchannel can reach a distinct backend, or use a look-aside LB. Verify what resolver the + target URI scheme maps to and whether `round_robin` is the right policy for the deployment + (verify against the currency brief for your version). + +- **Message size bumped past the default receive limit, or large payloads not streamed**: + `MaxRecvMsgSize` defaults to 4 MiB (verify against the currency brief for your version); + silently hitting it produces an error rather than a performance degradation, but the common + "fix" of raising it masks the real problem. Large payloads should stream in chunks rather + than be buffered as a single proto message — this bounds memory on both ends and avoids + forcing the GC to reclaim one giant allocation per call (cross-reference the core **Memory & + allocation** lane). Also look for repeated marshal/unmarshal of the same proto value in the + same request path — proto marshal allocates; reuse message objects where the code flow + allows. + +- **Compression applied indiscriminately or absent for large payloads**: gRPC gzip + (`grpc.UseCompressor(gzip.Name)` on the call, or `grpc.WithDefaultCallOptions` on the + client) compresses every message — beneficial for large text-heavy protos over WAN but + wastes CPU on small messages or already-compressed binary content. Conversely, leaving + compression off for multi-KB payloads over metered or high-latency links wastes bandwidth. + Match compression to median payload size and link characteristics; the `zstd` compressor + (if registered) often gives a better speed/ratio tradeoff than gzip (verify against the + currency brief for your version). + +- **Keepalive parameters mistuned for the network environment**: absent keepalive, idle + `ClientConn`s through NAT or cloud LBs silently drop — the next RPC fails with a transport + error instead of probing and reconnecting. Conversely, `keepalive.ClientParameters` with + a very short `Time` or `Timeout` trips the server's `keepalive.EnforcementPolicy` + (minimum ping interval) and causes GOAWAY / ENHANCE_YOUR_CALM, churning connections. + Look for `keepalive.ClientParameters` / `keepalive.ServerParameters` absent or with + `Time` shorter than the server's `MinTime` enforcement (verify against the currency brief + for your version). + +- **RPCs launched without a `context` deadline or without deadline propagation**: a unary or + streaming RPC started with `context.Background()` (no deadline attached) can block its + goroutine indefinitely if the server stalls — the goroutine is leaked until the process + exits. Always derive a per-call context with `context.WithTimeout` or + `context.WithDeadline`, and propagate an inbound deadline downstream rather than + substituting a fresh one. A missing deadline also means the server cannot detect + client-side cancellation and may do wasted work (cross-reference the core **Concurrency & + parallelization** lane). + +- **Heavy per-RPC interceptor allocations or deep interceptor chains**: unary and stream + interceptors run on every RPC. Interceptors that allocate a `map`, `[]string`, or log + buffer per call add steady GC pressure at high QPS. Order matters too — auth interceptors + that reject unauthenticated calls placed *after* expensive tracing interceptors do work + that will be discarded. Look for interceptors that marshal/unmarshal the full message for + logging, or that call `fmt.Sprintf` / structured-log functions constructing transient + objects on every RPC (cross-reference the core **Memory & allocation** lane). + +- **Unbounded per-RPC goroutine work with no concurrency cap**: grpc-go spawns one goroutine + per inbound RPC stream; `MaxConcurrentStreams` (verify against the currency brief for your + version) caps streams per connection but not total across connections. Expensive + synchronous work inside a server handler (DB queries, downstream RPCs, heavy CPU) with no + semaphore or worker-pool limit lets high inbound RPS exhaust goroutine memory and + downstream connection pools simultaneously. Apply a semaphore or bounded worker pool for + downstream fan-out, and propagate context cancellation so work is shed when the caller + has already given up (cross-reference the core **Concurrency & parallelization** lane). diff --git a/.claude/skills/performance-audit/profile-packs/go/messaging.md b/.claude/skills/performance-audit/profile-packs/go/messaging.md new file mode 100644 index 00000000..7a1b54a1 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/go/messaging.md @@ -0,0 +1,84 @@ +# Go performance module: Messaging & streaming (Kafka / NATS / RabbitMQ / Pub/Sub) +> Load when `github.com/segmentio/kafka-go`, `github.com/IBM/sarama`, `github.com/confluentinc/confluent-kafka-go`, `github.com/nats-io/nats.go`, `github.com/rabbitmq/amqp091-go`, or `cloud.google.com/go/pubsub` is detected — see the module map in `../go.md`. Core lanes + Runtime & GC notes live in `../go.md`; this file is the Messaging & streaming lens only. + +## Messaging & streaming (Kafka / NATS / RabbitMQ / Pub/Sub) + +> Covers **Kafka** via `github.com/segmentio/kafka-go`, `github.com/IBM/sarama`, and +> `github.com/confluentinc/confluent-kafka-go`; **NATS** (incl. JetStream) via +> `github.com/nats-io/nats.go`; **RabbitMQ/AMQP** via `github.com/rabbitmq/amqp091-go`; +> and **Google Pub/Sub** via `cloud.google.com/go/pubsub`. Bullets are *conditions to look +> for*. The recurring themes are connection/client reuse, producer and consumer batching to +> cut round-trips, bounded concurrency on message handlers, and right-sized payloads. + +- **Connection or client constructed per message or per request**: a kafka-go `Writer`/`Reader`, + sarama `Client`, confluent `Producer`/`Consumer`, NATS `Conn`, AMQP `Connection`, or Pub/Sub + `Client` negotiates TCP, TLS, and broker handshake on construction — each is expensive to + establish and designed to be **long-lived and shared**. Creating one per message (or per + inbound HTTP request) serializes connection setup and destroys throughput. For AMQP specifically, + share one long-lived `Connection` and multiplex via per-goroutine `Channel`s — AMQP channels are + **not goroutine-safe**, so each goroutine needs its own `Channel`, but all can share the + underlying `Connection` (verify against the currency brief for your version). + +- **Producer batching absent or disabled**: publishing one message per network round-trip (e.g., + kafka-go `Writer` with `BatchSize` of 1 or `BatchTimeout` at zero, sarama sync producer called + in a tight loop without async batching, confluent producer flushed after every produce) throttles + throughput to the round-trip latency of the broker. Configure `BatchSize` / `BatchTimeout` (or + the equivalent `linger.ms` / `batch.size` beneath the confluent C library) so the producer + accumulates a batch before sending. Separately, choose the `RequiredAcks` / `acks` durability + level deliberately — `acks=all` maximises durability but adds ISR-synchronisation latency; for + high-throughput pipelines that can tolerate potential loss, a lower acks setting may be + appropriate (verify against the currency brief for your version). + +- **Consumer fetch sizing too small**: Kafka consumers with `MinBytes` / `FetchMin` set to 1 byte + or `MaxBytes` / `FetchMax` at a very low value issue a round-trip to the broker for each + message rather than pulling a batch into a local buffer. Raise `MinBytes` (kafka-go) or + `Consumer.Fetch.Min` (sarama) so the broker waits until enough data is available before + responding, amortising round-trip cost across many messages. Similarly, an AMQP channel + `Qos` prefetch of 1 (`ch.Qos(1, 0, false)`) forces a broker ack-and-send cycle per message — + raise the prefetch count to match actual handler concurrency (cross-reference the core + **Data access & I/O** lane) (verify against the currency brief for your version). + +- **Synchronous publish or blocking ack on a request-serving goroutine**: calling a synchronous + produce (sarama `SyncProducer.SendMessage`, kafka-go `Writer.WriteMessages` with no timeout + context, NATS `Conn.Publish` followed by a blocking `Conn.Flush`) on the goroutine handling an + inbound request blocks that goroutine for the full broker round-trip and invites pileup under + load. Publish asynchronously — use sarama `AsyncProducer` and drain its `Errors` / + `Successes` channels in a background goroutine, or hand messages off to a buffered worker + channel; process consumes off the request path entirely (cross-reference the core **Concurrency + & parallelization** lane) (verify against the currency brief for your version). + +- **Unbounded per-message goroutine spawn in the consumer loop**: launching `go handle(msg)` for + every delivered message with no concurrency cap lets a slow downstream (DB, external service) + accumulate an unbounded number of in-flight goroutines, exhausting memory and overloading the + downstream. Bound concurrency with a fixed worker-pool receiving from a channel, or use + `errgroup.SetLimit(n)` (verify against the currency brief for your version) to cap concurrent + handlers; size the limit to what the downstream can actually absorb (cross-reference the core + **Concurrency & parallelization** lane). + +- **Per-message offset commit or ack (commit strategy not batched)**: committing a Kafka offset + (or acking a RabbitMQ delivery, or acknowledging a Pub/Sub message) synchronously after every + individual message adds a broker round-trip per message. Batch commits — commit the highest + processed offset periodically or after N messages; ack RabbitMQ deliveries with `multiple=true` + (`ch.Ack(tag, true)`) to acknowledge all deliveries up to that tag in one round-trip; use + Pub/Sub's `ReceiveSettings.MaxOutstandingMessages` to control flow rather than acking one at a + time. The trade-off is a larger duplicate-on-crash window vs throughput — accept it deliberately + rather than defaulting to per-message commits (verify against the currency brief for your + version). + +- **Message payload size and missing compression**: large message bodies inflate broker storage I/O, + network transfer, and Go GC pressure (each message body is a heap allocation). Right-size + messages — prefer normalised references or event identifiers over embedding full entity payloads. + When messages are unavoidably large and text-heavy, enable Kafka producer compression (`Codec` + in kafka-go, `Producer.Compression` in sarama, `compression.codec` in confluent) — snappy gives + low CPU overhead, lz4 good throughput, zstd the best ratio for CPU cost. Avoid re-serializing the + same payload once per partition or once per retry; marshal once and reuse the `[]byte` (cross- + reference the `serialization` module) (verify against the currency brief for your version). + +- **Partition count or key distribution bottlenecking consumer parallelism**: Kafka throughput + scales with partition count — a topic with too few partitions caps consumer-group parallelism + regardless of how many consumer instances are deployed (one partition can only be consumed by + one group member at a time). Equally, a poorly chosen message key can hash the majority of + traffic to one or a few partitions (hot-partition skew), leaving most consumer goroutines idle + while one is overloaded. Look for partition counts set at deployment defaults that were never + sized for the target throughput, and for key fields (user ID, tenant ID) whose cardinality or + distribution is badly skewed (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/go/net-http-servers.md b/.claude/skills/performance-audit/profile-packs/go/net-http-servers.md new file mode 100644 index 00000000..7c515369 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/go/net-http-servers.md @@ -0,0 +1,95 @@ +# Go performance module: HTTP servers & web frameworks (net/http / gin / echo / fiber / chi) +> Load when `net/http` HTTP servers, `github.com/gin-gonic/gin`, `github.com/labstack/echo`, +> `github.com/gofiber/fiber`, or `github.com/go-chi/chi` is detected — see the module map in +> `../go.md`. Core lanes + Runtime & GC notes live in `../go.md`; this file is the HTTP servers +> & web frameworks lens only. + +## HTTP servers & web frameworks (net/http / gin / echo / fiber / chi) + +> Scope: the stdlib `net/http` server and four popular routers/frameworks — gin (radix-tree, +> `net/http`-compatible), echo (radix-tree, `net/http`-compatible), fiber (fasthttp-based, +> NOT `net/http`-compatible), and chi (stdlib-compatible lightweight router). The recurring +> theme is unset production-safety defaults, per-request allocation in hot handlers, and +> blocking work that holds a goroutine — and for fiber, unique lifecycle rules on its pooled +> context and `[]byte` values that have no equivalent in the other frameworks. + +- **`http.Server` with no timeouts set**: an `http.Server` literal with `ReadTimeout`, + `WriteTimeout`, `ReadHeaderTimeout`, and `IdleTimeout` all at their zero values never times + out slow or stalled clients; this admits Slowloris-style resource exhaustion and lets + goroutines pile up indefinitely — look for `http.ListenAndServe(addr, handler)` or a bare + `http.Server{}` struct without timeout fields set (verify against the currency brief for + your version). + +- **`http.Client` or `http.Transport` created per request**: constructing a new `http.Client` + or `http.Transport` per call bypasses connection pooling entirely — each request opens a + fresh TCP connection and performs a new TLS handshake; the correct pattern is one long-lived + client reused across goroutines. Also check `MaxIdleConnsPerHost` on the shared transport: + its default is low relative to the concurrency typical production backends need, leaving + keep-alive slots underutilised under high fan-out (verify against the currency brief for + your version). Additionally, look for handlers that read `resp.Body` but do not drain and + close it — undrained bodies prevent the connection from returning to the pool + (cross-reference the **Data access & I/O** lane in `../go.md`). + +- **Per-request allocation in hot handlers**: handlers that re-compile a regexp, re-parse a + template, or re-construct a heavy struct on each invocation pay a fixed per-call cost that + compounds under concurrency — hoist the work to package scope or a `sync.Once`. Middleware + chains that allocate (per-request loggers, per-request UUID generators writing to an + allocated string) add GC pressure on every request; reuse buffers via `sync.Pool` where the + allocation is bounded and short-lived (cross-reference the **Memory & allocation** lane in + `../go.md`). + +- **Reading `r.Body` fully into memory vs streaming**: handlers that call `io.ReadAll(r.Body)` + or `ioutil.ReadAll(r.Body)` buffer the entire request body before processing, which bounds + throughput by available memory and raises peak allocation under concurrent load; prefer + `json.NewDecoder(r.Body).Decode(&v)` for JSON ingest or `io.Copy` to forward the body + downstream — both stream without materialising the full body (cross-reference the + **Memory & allocation** lane in `../go.md`). + +- **Buffering the response instead of streaming**: handlers that `json.Marshal` into a `[]byte` + and then call `w.Write(b)` allocate an intermediate buffer and delay the first byte to the + client; `json.NewEncoder(w).Encode(v)` writes directly to the `ResponseWriter` and is both + lower-allocation and lower-latency for large payloads. For large file responses, use + `http.ServeContent` or `io.Copy` rather than reading the file into a buffer first. When + true streaming is needed (SSE, chunked JSON arrays), verify that `w.(http.Flusher).Flush()` + is called and that no buffering middleware wraps the writer (cross-reference the + **Data access & I/O** lane in `../go.md`). + +- **Blocking work on the request goroutine without context propagation**: handlers performing + DB queries, outbound HTTP calls, or any other I/O without forwarding `r.Context()` to the + downstream call cannot be cancelled when the client disconnects — the goroutine (and any + held resources) run to completion regardless; pass `r.Context()` (or a child derived from + it) into every blocking call so client-disconnect cancellation propagates + (cross-reference the **Concurrency & parallelization** lane in `../go.md`). + +- **Middleware ordering and blanket cost**: expensive middleware applied globally — per-request + body logging with allocation, gzip compression on every response regardless of payload size, + per-request tracing spans on non-instrumented routes — runs even on requests that exit early + (health checks, 404s); for gin/echo/chi, scope heavy middleware to the route groups that + need it rather than mounting at the root. For gzip specifically, compression is harmful on + already-compressed payloads (images, video, pre-compressed static assets) and on tiny + payloads where CPU cost exceeds transmission savings — check that a minimum-size threshold + and a content-type allowlist are configured (verify against the currency brief for your + version). + +- **gin `Context` retention past the handler; fiber `*fiber.Ctx` and `[]byte` retention past + the handler**: gin pools `*gin.Context` — retaining a pointer to it (e.g., in a goroutine + launched inside the handler, or in a closure stored on a struct) causes a data race when the + pool recycles the context for the next request; copy any needed values out before the handler + returns or call `c.Copy()` for a heap-allocated snapshot. fiber is built on fasthttp and has + a fundamentally different lifecycle: `*fiber.Ctx` and all `[]byte` values it exposes + (`c.Body()`, `c.Params(...)` as bytes, header byte slices) are reused by the fasthttp + allocator after the handler returns — retaining any of them across the handler boundary or + in a launched goroutine corrupts data silently; copy to a `string` or a separately allocated + `[]byte` before the handler exits. fiber's API is also NOT compatible with `net/http` + middleware or `context.Context` propagation patterns used by gin/echo/chi — stdlib-ecosystem + middleware cannot be reused directly (verify against the currency brief for your version). + +- **`MaxHeaderBytes` unset and HTTP/2 / h2c not intentionally configured**: the default + `MaxHeaderBytes` on `http.Server` is permissive; leaving it unset allows clients to send + very large header blocks that consume memory before the handler runs — set it explicitly for + public-facing servers. For services behind a proxy that already terminates TLS, evaluate + whether `h2c` (cleartext HTTP/2 via `golang.org/x/net/http2/h2c`) is appropriate to regain + multiplexing and header compression on the internal leg; and for TLS servers, confirm that + HTTP/2 is enabled (it is by default when using `ListenAndServeTLS` with a compatible + handler, but custom `tls.Config` can inadvertently disable it) (verify against the currency + brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/go/serialization.md b/.claude/skills/performance-audit/profile-packs/go/serialization.md new file mode 100644 index 00000000..c6a4493c --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/go/serialization.md @@ -0,0 +1,83 @@ +# Go performance module: Serialization (encoding/json / protobuf / msgpack) +> Load when `encoding/json`, `google.golang.org/protobuf`, `github.com/json-iterator/go`, +> `github.com/mailru/easyjson`, `github.com/goccy/go-json`, or +> `github.com/vmihailenco/msgpack` is detected — see the module map in `../go.md`. Core +> lanes + Runtime & GC notes live in `../go.md`; this file is the Serialization lens only. + +## Serialization (encoding/json / protobuf / msgpack) + +> Scope: stdlib `encoding/json` (reflection-based `Marshal`/`Unmarshal`), the emerging +> `encoding/json/v2` direction (verify the milestone in the currency brief / version index), +> code-generated `easyjson`, drop-in faster replacements +> (`goccy/go-json`, `jsoniter`), protobuf (`google.golang.org/protobuf`), msgpack, plus +> `encoding/gob` and `encoding/xml`. The recurring theme is reflection and allocation cost +> on hot paths, streaming vs whole-buffer trade-offs, decoding into concrete types rather +> than dynamic maps, and matching the wire format to the interop need. + +- **Reflection cost of `json.Marshal`/`json.Unmarshal` on hot paths**: the stdlib encodes + and decodes via reflection on every call — it caches per-type field metadata, but the + reflective walk and per-call allocations remain; look for calls inside request handlers, + tight loops, or per-message processing that accumulates under load. For the hottest paths + consider a code-generated marshaler (`github.com/mailru/easyjson`) or a faster drop-in + replacement (`github.com/goccy/go-json`, `github.com/json-iterator/go`) that retains + the stdlib API surface (verify against the currency brief for your version). + +- **Whole-buffer vs streaming encode/decode**: `json.Marshal(v)` builds the complete + `[]byte` in memory before returning; `json.NewEncoder(w).Encode(v)` writes directly to + an `io.Writer` — for large payloads or HTTP response bodies this avoids the intermediate + allocation and reduces time-to-first-byte. Conversely, `json.NewDecoder(r).Decode(&v)` + streams from an `io.Reader` rather than requiring `io.ReadAll` first. Know the semantics + differences: `Encoder.Encode` appends a trailing newline; a `Decoder` over a connection + may leave unconsumed bytes if the stream contains multiple values (cross-reference the + **HTTP servers & web frameworks** module in `net-http-servers.md` and the **Data access + & I/O** lane in `../go.md`). + +- **Decoding into `map[string]any` or `any` instead of a concrete struct**: unmarshaling + into a dynamic map or bare `interface{}` forces full per-field reflection, boxes every + value into an `interface{}`, and allocates for every key string and value; it also blocks + any compiler analysis of field access. Decode into a typed struct instead. Where only a + sub-tree is needed, decode the surrounding message into a struct that holds a + `json.RawMessage` field and decode the sub-tree lazily or not at all. + +- **Struct shape and tag hygiene inflating payload or work**: exported fields with no + `json:"-"` tag that the consumer never reads are marshaled on every call — adding `"-"` + eliminates the work; missing `omitempty` on optional fields sends zero-value noise over + the wire and through the decoder on the other side; very deep or wide nested structs + multiply the reflective walk proportionally. Audit the struct against the actual wire + contract, not just the Go representation. + +- **`[]byte` fields encoded as base64 and buffer allocation on hot serialize paths**: the + JSON encoder represents `[]byte` as base64, which is both larger and costlier than the + raw binary; large blob fields are particularly expensive. Repeated `[]byte(s)` / + `string(b)` conversions on hot paths each copy the backing array. Reuse encode buffers + via a `sync.Pool` of `*bytes.Buffer` (call `Reset()` on retrieval) rather than + allocating a fresh buffer per call — this is the canonical intersection with the + **Memory & allocation** lane in `../go.md` (cross-reference the `sync.Pool` bullet there). + +- **Protobuf allocation and repeated re-marshaling**: protobuf is binary, smaller, and + faster to marshal/unmarshal than JSON for service-to-service traffic, but + `proto.Marshal` still allocates; reuse message structs (reset with `proto.Reset`) where + the struct is not shared, and avoid re-marshaling the same logical payload more than once + per hop (cross-reference the **gRPC** module when detected). Don't use + `proto.MarshalOptions{}.Marshal` in a per-request hot path without checking whether a + pooled approach fits the message lifecycle (verify against the currency brief for your + version). + +- **`Decoder.UseNumber()` and custom `MarshalJSON`/`UnmarshalJSON` methods as hidden + costs**: by default the JSON decoder represents all numbers as `float64`, which loses + precision for large integers — `Decoder.UseNumber()` defers parsing so the caller can + call `.Int64()` or `.Float64()` explicitly. Separately, any type that implements + `json.Marshaler` or `json.Unmarshaler` has its method called per value during traversal; + if such a method allocates (building a formatted string, calling `fmt.Sprintf`, making + an intermediate map) that cost multiplies across every element in a collection — look for + custom JSON methods on high-cardinality types in hot serialization paths (verify against + the currency brief for your version). + +- **Choosing the wrong wire format for the interop need**: `encoding/gob` is Go-only, + stateful (receiver must pre-register concrete types behind interfaces), and unsuitable + for cross-language or cross-version interop; `encoding/xml` is heavier than JSON in + both parse cost and wire size; msgpack (`github.com/vmihailenco/msgpack`) is a compact + binary middle ground that crosses language boundaries without a schema — match the + format to the actual interop requirement, payload volume, and versioning story rather + than defaulting to JSON for all traffic (verify against the currency brief for your + version). diff --git a/.claude/skills/performance-audit/profile-packs/html.md b/.claude/skills/performance-audit/profile-packs/html.md new file mode 100644 index 00000000..0e9c05ea --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/html.md @@ -0,0 +1,155 @@ +# Profile Pack: HTML (plain documents & the rendering path) + +A **companion** pack for **plain HTML / the document layer** — the performance of the markup the +browser receives and renders, *independent of any JS framework*. It loads **alongside** whatever +backend emits the HTML (Django/Jinja, Rails/ERB, Laravel/Blade, .NET Razor, Express/Nunjucks, PHP, or a +static-site generator), and also applies to the *rendered HTML output* of JS frameworks. It is about +the **document, its subresources, and the critical rendering path** — **not** the JS bundle: for +bundler concerns (tree-shaking, code-splitting, transpile target) see the JS/TS `bundling-build` module +when a bundler is in use; this pack is the markup/document/delivery/rendering layer that exists even +with little or no JavaScript. + +**Content-detected** (`.html`/`.htm`, server templates — `*.erb`, `*.jinja`/`*.j2`, `*.twig`, +`*.blade.php`, `*.cshtml`/Razor, `*.njk` — static-site generators, `<!DOCTYPE html>` markup). Signals +are durable and browser-agnostic; concrete baseline/feature claims are tagged "(verify against the +currency brief for your version)" because browser support and defaults move. Deep **image** and **font** +lenses load as modules — see the map at the bottom. + +--- + +## Algorithmic / rendering & layout cost (lane `algorithmic`) +- **A very large DOM makes every style recalc and layout pass more expensive** — cost scales with node + count, so a page that emits tens of thousands of nodes (usually an un-paginated server-side loop over + rows) is slow to lay out and heavy in memory regardless of CSS. Paginate, virtualize, or summarize + server-side rather than shipping the whole set as markup. +- **CSS that forces wide style recalc**: very large stylesheets re-matched against a large DOM, and + broad/deeply-descendant or universal (`* {}`) selectors, make style recalculation a measurable cost + on big pages — keep selectors shallow and stylesheets scoped to what the page uses. +- **Everything laid out up front on a long page**: `content-visibility: auto` (with + `contain-intrinsic-size` so the scrollbar stays honest) lets the browser skip layout/paint for + off-screen sections until they approach the viewport — a large win on long documents (verify against + the currency brief for your version). +- **Animating layout- or paint-triggering CSS properties**: animating `top`/`left`/`width`/`height`/ + `margin` re-runs layout every frame, and `box-shadow`/`background` re-runs paint — both jank. Animate + the **compositor-only** properties `transform` and `opacity`, which the GPU handles without layout or + paint. Promote an element to its own layer (`will-change`, or `transform: translateZ(0)`) *sparingly* + — each layer costs memory, so promoting many elements backfires (verify against the currency brief + for your version). + +## Memory & document size (lane `memory`) +- **DOM node count is itself a cost**: every element retains memory and slows traversal/style/layout; + thousands of nodes from un-paginated loops, deeply wrapped markup, or builder-generated `<div>` soup + is the signal (see the algorithmic lane for the layout-cost side). +- **Heavy inline payloads in the document**: a large inline `<script>`/`<style>`, a big `data:` URI, or + a large inline JSON/state blob bloats the HTML, cannot be cached separately from the document, and + delays parse — weigh inlining (saves a request, no separate caching) against an external, cacheable + file. +- **Bytes shipped the page never uses**: large `display:none`/hidden subtrees rendered server-side + "just in case", dead or commented-out markup, and unused inline CSS all ship and parse for nothing — + emit them lazily or not at all. + +## Data access & I/O — delivery (lane `data-access`) +- **Text resources served without compression**: HTML/CSS/JS/SVG without Brotli (or gzip) at the + server/CDN is a large, cheap first-load win — confirm the response `Content-Encoding` (verify against + the currency brief for your version). +- **Caching not set up for the asset's lifetime**: fingerprinted static assets (CSS/JS/images) want a + long-lived `Cache-Control: immutable`; the HTML document usually wants short/`no-cache` with + revalidation (`ETag`/`Last-Modified`). Re-downloading unchanged assets every visit is the signal. +- **Critical-path request count and obsolete bundling**: under HTTP/2/3 many small multiplexed files + are fine and improve caching granularity, so **domain sharding and aggressive concatenation/spriting + are counter-productive** on a modern protocol — but uncached third-party requests and unbounded + blocking requests still cost. Verify the served protocol before recommending either direction (verify + against the currency brief for your version). +- **Cross-origin connections set up lazily**: required third-party origins (font host, image CDN, API) + not warmed with `preconnect`/`dns-prefetch` pay DNS+TCP+TLS on first use, on the critical path. +- **No CDN/edge for static assets** where user latency matters; TTFB dominated by a slow origin (the + backend pack owns server time — this pack flags the delivery *shape*, not the server logic). + +## Payload / startup / critical rendering path (lane `payload-startup`) +- **Render-blocking CSS**: every `<link rel="stylesheet">` blocks the first paint until it is downloaded + and parsed — inline the critical (above-the-fold) CSS and load the rest non-blocking + (`media`-attribute toggling or `rel=preload`+swap), and remove unused CSS so the blocking stylesheet + is small. Avoid CSS `@import` in stylesheets: the imported sheet isn't discovered until its parent has + downloaded and parsed, serializing fetches into a waterfall — prefer top-level `<link>`s the preload + scanner can start in parallel. +- **Parser-blocking scripts**: a `<script>` without `async`/`defer` in `<head>` halts HTML parsing + while it downloads and runs — use `defer` (run after parse, in order) or `async` (run ASAP, unordered) + and place scripts deliberately; native module scripts are deferred by default. +- **`<head>` order and the preload scanner**: put `<meta charset>` first and critical CSS early, and + keep critical subresources as discoverable `<link>`/`<img>` in the markup — a resource hidden behind + JS or CSS (`background-image`, dynamically injected) is found late, after the preload scanner could + have started it. +- **Missing hints for the late-discovered critical resource**: `<link rel="preload">` the LCP image or + a critical font (discovered late, in CSS), `modulepreload` a critical module graph — but + over-hinting de-prioritizes everything, so reserve it for the genuinely critical few (verify against + the currency brief for your version). +- **Heavy third-party scripts**: analytics, tag managers, ads, chat/social widgets each add + render-blocking or main-thread cost and a network dependency — load them `async`/`defer`, lazy-load + the non-critical ones, use a click-to-load facade for heavy embeds, and audit tag-manager sprawl. +- **Un-minified or unused payload shipped to production**: un-minified HTML/CSS/JS, or a large CSS/UI + framework pulled in whole for a few components — minify and trim to what the page uses. +- **Speculative loading for the next navigation, where it pays**: `<link rel="prefetch">` or the + Speculation Rules API can prefetch/prerender a likely next page for near-instant navigation — weigh + the wasted bandwidth on the pages users *don't* visit (verify against the currency brief for your + version). + +## Framework-idiom currency (lane `idiom-currency`) +- **JavaScript reinventing a now-native platform feature**: a JS library doing what the platform now + does natively — e.g. lazy-loading (`loading="lazy"`), modals/disclosure (`<dialog>`/`<details>`), + layout reservation (CSS `aspect-ratio`), or off-screen skipping (`content-visibility`), among other + newly-Baseline primitives — ships script weight and main-thread cost the native element doesn't. + Flag the library where the native feature now covers the use case (verify against the currency brief + for your version). +- **Legacy formats/loading where modern ones win**: old image formats and font formats, and + fixed-size images without `srcset`, where AVIF/WebP, WOFF2, and responsive images would cut bytes — + see the `images-media` and `fonts` modules. +- Consult the currency brief for changed browser defaults and newly **Baseline** features the markup + could adopt; offline, note candidate idiom concerns at LOW confidence for manual currency check. + +--- + +## Rendering path & Core Web Vitals (use for every HTML audit) + +HTML performance is judged against how the browser turns bytes into pixels, and against the user-centric +metrics — this is the HTML analog of a runtime-notes section: how to reason and measure before +concluding. + +- **The critical rendering path**: the browser streams HTML into the **DOM**, blocks rendering on CSS + (the **CSSOM**) and on parser-blocking scripts, then runs style → layout → paint → composite. The + three durable levers are: *don't block the parser/renderer* (async/defer JS, non-blocking non-critical + CSS), *let the preload scanner discover subresources early* (keep them in the markup), and *ship the + above-the-fold content first*. +- **Core Web Vitals are the measurement frame**: **LCP** (largest contentful paint — usually the hero + image or heading; make it discoverable, prioritized, not lazy-loaded, and served fast), **CLS** + (cumulative layout shift — reserve space for images, embeds, ads, and font swaps so nothing jumps), + **INP** (interaction latency — mostly a JS main-thread concern, minimal on a no-JS page), and **TTFB** + (server response — owned by the backend but it caps everything downstream). +- **Measure with lab *and* field tools**: Lighthouse / WebPageTest / DevTools give a controlled lab + number; CrUX / RUM give what real users on slow devices and networks actually experience — a fast + lab score can hide a poor field result, so confirm against field data where available, and throttle + the lab to a realistic device/network. +- **Judgment, not a scorecard**: a heavy hero image on a landing page may be the entire point; flag the + *avoidable* delay, shift, and bytes on the critical path — not every byte. A region that is inherent + to the page's job is not automatically a defect. + +## Framework / sub-stack modules (load on detection) + +Load the lanes + Rendering-path notes above for *every* HTML audit. Additionally load a module when its +surface is material to the page. + +| Detected (signals) | Load module | +|---|---| +| **Images & media** — significant imagery or embeds: `<img>`/`<picture>`/`srcset`, `<video>`, `<iframe>` embeds, inline SVG | [`html/images-media.md`](html/images-media.md) | +| **Web fonts** — `@font-face`, a `<link>` to Google Fonts / a font CDN, or `.woff2`/`.woff`/`.ttf` assets | [`html/fonts.md`](html/fonts.md) | + +## Sources + +Durable signals here are grounded in platform/standards documentation; version-specific support belongs +in the currency brief. + +- **web.dev** — "Learn Core Web Vitals", "Critical rendering path", LCP/CLS/INP optimization guides, + "Preload critical assets", third-party/facade patterns. +- **MDN** — `loading`, `fetchpriority`, `<link rel=preload/preconnect/modulepreload>`, `font-display`, + `srcset`/`sizes`, `content-visibility`, `<dialog>`, Speculation Rules. +- **HTTP Archive / Web Almanac** — real-world distributions for markup, CSS, fonts, media; Lighthouse + audit definitions. diff --git a/.claude/skills/performance-audit/profile-packs/html/fonts.md b/.claude/skills/performance-audit/profile-packs/html/fonts.md new file mode 100644 index 00000000..25d24940 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/html/fonts.md @@ -0,0 +1,88 @@ +# HTML performance module: Web fonts +> Load when the page uses web fonts — `@font-face`, a `<link>` to Google Fonts / a font CDN, or `.woff2`/`.woff`/`.ttf` assets — see the module map in `../html.md`. Core HTML lanes + Rendering-path notes live in `../html.md`; this file is the Web fonts lens only. + +## Web fonts + +> Scope: `@font-face` declarations, font CDN `<link>`s, and the WOFF2/WOFF/TTF assets they +> reference. The recurring theme is that web fonts are discovered late, block or delay text +> rendering by default, and cause layout shift when the fallback and the webfont have different +> metrics. The corrective levers are: make text visible immediately (`font-display`), pull the +> critical font earlier (preload), eliminate the swap-shift by matching fallback metrics +> (`size-adjust` / metric overrides), and ship only the bytes the page actually needs (WOFF2, +> subsetting, no unused weights). + +- **`font-display` default hides text until the font loads (FOIT)**: the browser default for + `@font-face` is effectively `block` — text is invisible for up to ~3 s while the font + downloads, directly harming FCP and perceived LCP (see the Rendering-path notes in + `../html.md`). `font-display: swap` shows the fallback immediately and swaps on load (FOUT — + text is readable, shift may occur); `font-display: optional` uses the webfont only if it + arrives within a short window and suppresses the swap entirely, eliminating both the block and + the layout shift at the cost of the webfont being skipped on slow connections. Pick per use + case: `swap` for body copy where readability matters most, `optional` for decorative fonts + where the webfont is a cosmetic enhancement (verify against the currency brief for your + version). + +- **Fonts are discovered late — the critical font should be preloaded**: a `@font-face` URL is + embedded in a stylesheet, so the browser cannot fetch the font until it has downloaded, + parsed, and applied the CSS and determined which rules are used — pushing the fetch well into + the waterfall. A `<link rel="preload" as="font" type="font/woff2" crossorigin>` for the + one or two fonts used above the fold moves the fetch to the preload scanner and removes that + cascade delay. Over-preloading (every weight, every style) competes with higher-priority + resources (see the payload-startup lane in `../html.md`) and can hurt LCP; limit preloads + to the fonts that gate above-the-fold text render (verify against the currency brief for + your version). + +- **Layout shift on font swap from mismatched fallback metrics**: when a webfont swaps in with + different glyph widths, ascenders, or line heights than the fallback, text reflows — directly + registering as CLS (see the Rendering-path notes in `../html.md`). `size-adjust`, + `ascent-override`, `descent-override`, and `line-gap-override` on a `@font-face` fallback + declaration tune the fallback font's metrics to closely match the webfont so the swap causes + little or no reflow. The shift is often large enough (0.1 + CLS) to fail Core Web Vitals on + its own; metric overrides are one of the few reliable ways to eliminate it without removing + the webfont (verify against the currency brief for your version). + +- **Serving TTF / OTF / WOFF where WOFF2 would do**: WOFF2 uses Brotli compression + internally and is ~30% smaller than WOFF and significantly smaller than TTF/OTF; all modern + browsers support it. Shipping uncompressed or less-compressed formats wastes bytes on every + font load. Check `@font-face` `src` order: the first matching `format()` hint the browser + accepts wins — if TTF is listed before WOFF2 a modern browser will take the larger file. + WOFF/TTF/EOT/SVG font fallbacks are only relevant for legacy targets that should be a + deliberate decision, not an accidental default (verify baseline browser support for your + target audience). + +- **Shipping a full character set when only a subset is used**: a single font file can be + 300–600 KB when it covers Latin Extended, Cyrillic, Greek, CJK, and symbol ranges — most of + which the page never renders. `unicode-range` in `@font-face` splits a font into range + subsets so the browser fetches only the slices whose characters actually appear on the page. + Build-time subsetting (pyftsubset, glyphhanger, or similar) further reduces file size by + removing glyphs not in the design's character set before the file is served. Look for a + single monolithic `@font-face` with no `unicode-range` on a page that serves a single + language. + +- **Loading multiple static weight files where a variable font would be fewer requests**: a + design using four weights (regular, medium, semibold, bold) and their italic variants loads + up to eight separate font files — eight requests, eight round trips. A single variable font + file covering the same axes is fewer requests and often smaller total payload when multiple + weights are actually rendered. The calculus reverses when only one weight is used: a static + subset of that weight is smaller than the variable font, which must encode the full variation + data. Audit which `font-weight` values `getComputedStyle` resolves to on rendered text before + deciding (verify against the currency brief for your version). + +- **Third-party font hosting adds a cross-origin connection to the critical path**: a Google + Fonts `<link>` or other font CDN requires a DNS lookup, TCP handshake, and TLS negotiation + to a new origin before any font byte can be received — this is on the critical path for + above-the-fold text. `<link rel="preconnect">` to the font origin warms the connection + earlier, reducing the penalty; `<link rel="dns-prefetch">` is a lighter fallback. Self-hosting + WOFF2 from the same origin removes the cross-origin cost entirely, enables same-origin caching + headers, and avoids third-party availability and privacy dependencies. If a font CDN is + unavoidable, preconnect is a low-effort partial mitigation, not a substitute (verify + against the currency brief for your version). + +- **Loading weights and styles the design never renders**: every `@font-face` block with a + distinct `font-weight` or `font-style` value is a separate file and a separate network + request — even if no element on the page ever matches that combination. Audit the stylesheet + for declared `@font-face` blocks versus the `font-weight`/`font-style` values that + `getComputedStyle` actually resolves to on rendered elements; drop declarations for + unmatched combinations. Where the design allows it, a system-font stack (`system-ui`, + platform defaults) carries zero network cost and renders immediately — worth considering for + body copy on performance-constrained targets. diff --git a/.claude/skills/performance-audit/profile-packs/html/images-media.md b/.claude/skills/performance-audit/profile-packs/html/images-media.md new file mode 100644 index 00000000..d8d646b7 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/html/images-media.md @@ -0,0 +1,78 @@ +# HTML performance module: Images & media +> Load when the HTML carries significant imagery or embeds — `<img>`/`<picture>`/`srcset`, `<video>`, +> `<iframe>` embeds, or inline SVG — see the module map in `../html.md`. Core HTML lanes + Rendering-path +> notes live in `../html.md`; this file is the Images & media lens only. + +## Images & media + +> Scope: `<img>`, `<picture>`, `<video>`, `<iframe>`, and SVG in HTML documents. The recurring theme is +> that images are typically the largest bytes transferred and the most common LCP element — right-size and +> right-format them first, reserve their layout space to avoid CLS (see the Rendering-path notes in +> `../html.md`), then prioritize the LCP image and defer everything else. + +- **Serving a single fixed image to every viewport/DPR**: without `srcset` + `sizes` on `<img>`, every + device receives the largest image the layout ever needs — a mobile user at 1× DPR downloads the same + asset as a 4K desktop at 3× DPR. The trade-off is markup complexity vs. systematic byte savings (often + 50–80% on small viewports); `<picture>` with `<source media="...">` is the right tool when the image + crop or subject changes across breakpoints (art direction), while plain `srcset`+`sizes` on `<img>` is + sufficient for resolution switching on the same crop. + +- **Serving legacy formats when modern alternatives are supported**: AVIF offers significantly better + compression than WebP, which in turn beats JPEG/PNG at equivalent visual quality — serving legacy + formats at 2–10× the byte cost for the same perceived quality is the single biggest image-weight lever. + Use a `<picture>` fallback chain (`<source type="image/avif">` → `<source type="image/webp">` → + `<img>` JPEG/PNG) so browsers that support the better codec use it without breaking older ones (verify + against the currency brief for your version). + +- **Missing `width`/`height` attributes causing layout shift**: when an `<img>` or `<video>` element has + no explicit `width`/`height` (or equivalent CSS `aspect-ratio`), the browser can't reserve space in the + layout before the resource loads — the image arrives and pushes surrounding content down, which is a + primary cause of poor CLS scores (see the Rendering-path notes in `../html.md`). The fix is either HTML + attributes matching the intrinsic size or a CSS `aspect-ratio` rule; the browser uses the ratio, not the + literal pixel value, so responsive images with `max-width:100%` still work correctly. + +- **`loading="lazy"` on the LCP or above-the-fold image**: the browser defers lazy-loaded images until + the element is near the viewport — for the hero/LCP image, this means the fetch doesn't start until + after layout, making it self-defeating; the image is discovered late and fetched late, directly worsening + LCP (cross-reference the LCP framing in `../html.md`). Lazy-loading belongs only on images that are + reliably below the fold. The inverse failure — omitting `loading="lazy"` on images that are always far + below the fold — wastes bandwidth on initial load for assets the user may never scroll to. + +- **LCP image not discoverable by the preload scanner**: when the LCP image is set via CSS + `background-image` or injected by JavaScript, the browser's preload scanner (which finds `<img src>` + and `<link rel=preload>` in the raw HTML) cannot see it — the fetch is blocked behind CSS/JS parse and + execution, delaying LCP substantially (cross-reference the payload-startup lane in `../html.md`). Prefer + a real `<img>` element for the LCP candidate, or add `<link rel="preload" as="image" + imagesrcset="..." imagesizes="...">` so the scanner can start the fetch immediately (verify against the + currency brief for your version). + +- **Deprioritizing or not prioritizing the LCP image**: the browser assigns images a low-to-medium fetch + priority by default; for the LCP image that priority is too low when there is competing resource + contention. `fetchpriority="high"` on the LCP `<img>` (or on the corresponding `<link rel=preload>`) + signals the browser to promote it in the request queue. Conversely, non-critical below-fold images + benefit from `fetchpriority="low"`, and `decoding="async"` prevents any image from blocking the main + thread during decode. Stacking all three attributes on every image indiscriminately defeats the signal + (verify against the currency brief for your version). + +- **Oversized intrinsic dimensions relative to the displayed size**: an image served at 4000 × 3000 px + and rendered at 400 × 300 CSS px transfers 100× more pixels than needed at 1× DPR, amplified further at + higher DPR. This is distinct from format choice — even a well-compressed AVIF is wasteful if it encodes + far more pixels than the layout uses. Right-sizing at the origin or at a CDN image-transform layer (which + can resize, reformat, and cache on request) eliminates the waste without client-side changes; look for + images whose intrinsic dimensions dwarf the `sizes`/CSS display size as the condition. + +- **Eagerly loaded `<video>` or third-party `<iframe>` embeds**: a `<video preload="auto">` or a + `<video>` without `preload="none"` starts buffering media on page load regardless of whether the user + ever plays it; a YouTube, map, or chat iframe loaded eagerly fires dozens of third-party sub-requests + that consume connection budget and bandwidth before any user interaction. Use `preload="none"` + a + `poster` image for video; use `loading="lazy"` on off-screen iframes; replace third-party embeds with a + lightweight facade element (a static thumbnail + play button) that loads the real embed only on click + (cross-reference the payload-startup lane in `../html.md` for connection-budget impact). + +- **Large or unoptimized inline SVG**: SVG inlined directly in HTML avoids a separate request and can be + styled/animated with CSS, but unoptimized SVG (editor cruft, redundant paths, excessive precision, large + path data for complex illustrations) bloats the HTML document — defeating HTTP compression gains on the + page and making the document non-cacheable as a standalone asset. Run inlined SVG through an optimizer + (e.g., SVGO) and evaluate whether the inline benefit outweighs extractability; for icons used repeatedly, + a referenced SVG sprite sheet or symbol-based sprite is usually both smaller per-use and independently + cacheable compared to many separate inline SVGs or per-icon `<img>` requests. diff --git a/.claude/skills/performance-audit/profile-packs/javascript-typescript.md b/.claude/skills/performance-audit/profile-packs/javascript-typescript.md new file mode 100644 index 00000000..ad2a7823 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/javascript-typescript.md @@ -0,0 +1,179 @@ +# Profile Pack: JavaScript / TypeScript + +Specializes the generic lanes for Node.js and browser JS/TS stacks. Signals below are durable +idioms; volatile version details live in the currency brief / version index, not here. + +This is the **core** JS/TS pack (always-loaded lanes + Runtime notes). Deep, tech-specific lenses +(React, Angular, Vue, the Node.js backend runtime, the Node data layer, and bundling/build) live in +load-on-detection modules under `profile-packs/javascript-typescript/` — see **`## Framework / +sub-stack modules`** at the bottom. Load the core for every JS/TS project; add a module only when its +signals are *material* to the scope. + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) +- `.includes`/`.indexOf`/`.find` inside loops → accidental O(n²); replace with `Set`/`Map` lookups. +- Repeated array rebuilds on every render/call where a single pass or memoized result would do. +- Object key enumeration (`Object.keys`/`Object.entries`) inside hot loops over large objects; cache + the keys array or use `Map` with better big-O iteration. +- Recomputing derived values on every access instead of caching them (pure functions, stable inputs). +- Sorting, filtering, or slicing the same source array on every render/request rather than once on + data change; pay attention to large list operations that run in tight update loops. +- Using `Array.prototype` methods that create intermediate arrays (`.map().filter()`) where a single + `for` loop or generator pipeline would avoid O(n) extra allocation. +- Deeply nested object traversal on hot paths where a flat structure or indexed map would achieve + O(1) access. + +## Memory & allocation (lane `memory`) +- Chained `.map().filter().map()` building large intermediate arrays; consider a single `.reduce` + or a generator-based lazy pipeline. +- Needless spread/clone of large objects (`{ ...bigObj }`, `[...bigArr]`) on hot paths; prefer + mutating a working copy or using structured references. +- Closures inadvertently retaining large scopes: event listeners, timers, or async callbacks holding + entire module scope or large DOM subtrees, preventing garbage collection. +- Unbounded `Map`/`Set`/plain-object caches with no eviction policy; growing event-listener lists + never removed; `setInterval` callbacks never cleared. +- Large, deeply reactive data structures unnecessarily wrapped in the framework's proxy/reactive + system (Vue `reactive`, MobX, etc.) — store non-reactive data outside reactive scope or mark as + raw (verify against the currency brief for your version). +- Attaching large non-reactive datasets (lookup tables, raw blob data) directly to component state + or global stores, causing framework overhead on every state read. +- Holding `ArrayBuffer` / `TypedArray` slices longer than needed; prefer transferable objects over + structured-clone copies when moving data to Workers. + +## Data access & I/O (lane `data-access`) +- N+1 fetches: one `fetch`/DB call per loop iteration instead of batching or a single bulk request; + applies equally to REST, GraphQL, and ORM-generated queries. +- Missing `Promise.all` / `Promise.allSettled` for independent parallel requests (sequential awaits + when the calls have no data dependency on each other). +- Over-fetching in GraphQL (selecting all fields) or REST (no sparse fieldsets); missing pagination + causing unbounded response sizes. +- `JSON.parse`/`JSON.stringify` on large payloads in hot paths; consider streaming JSON parsers or + NDJSON line-by-line processing (verify against the currency brief for your version). +- Missing or invalidated HTTP/service-worker/CDN cache layers; headers that cause cache-busting on + every request (e.g., aggressive `Cache-Control: no-store` on static assets). +- Synchronous `localStorage` reads on hot rendering paths (main-thread blocking); prefer async + storage or a one-time in-memory cache populated at startup. +- Inefficient ORM queries: missing `.select()` field projection, missing `.include()` preloads + causing N+1, or fetching full rows when only aggregates are needed. + +## Concurrency & parallelization (lane `concurrency`) +- **Exploit:** sequential `await` in loops for independent async work — replace with `Promise.all`. + Verify independence (no shared mutable state, no ordering requirement) before parallelizing. +- **Exploit:** missing streaming for large responses/files; buffering entire payload before + processing when a pipeline would reduce peak memory and time-to-first-byte. +- **Exploit:** unparallelized initialization: multiple independent async setup steps (DB connect, + config load, cache warm) run sequentially at startup instead of via `Promise.all`. +- **Defend:** blocking the event loop with synchronous CPU-heavy work (large sorts, crypto, image + processing, complex regex on large inputs) — offload to Worker Threads or a worker pool + (verify against the currency brief for your version). +- **Defend:** `setTimeout`/`setInterval` drift from long synchronous tasks starving the event loop; + split large work into chunks with `setImmediate` / `queueMicrotask` yielding. +- **Defend:** uncontrolled concurrency — spawning N promises for N items with no concurrency limit + (connection pool exhaustion, rate-limit errors, memory spikes); use a semaphore or batching. +- **Defend:** Worker Thread creation on every request rather than using a persistent pool; thread + startup is ~30 ms; pools amortize that cost across many tasks. + +## Framework-idiom currency (lane `idiom-currency`) +- Consult the version index and currency brief. Flag patterns the brief marks superseded/deprecated + (e.g., legacy lifecycle hooks, deprecated build APIs, removed render methods); flag fast-path APIs + listed in the index that the code doesn't use; flag changed defaults the code still fights. +- Check for manual memoization (`useMemo`/`useCallback`/`React.memo`, Angular pure pipes, Vue + `computed`) that the current toolchain may auto-handle — or, conversely, memoization that is + missing where it would matter (verify against the currency brief for your version). +- Offline (no brief): note candidate idiom concerns at LOW confidence, flagged for manual currency + check. + +## Payload / startup / build (lane `payload-startup`) +- Bundle size: large dependencies pulled in entirely when only a small slice is used; prefer named + imports to enable tree-shaking (verify against the currency brief for your version). +- Missing code-splitting / lazy-loading for routes or heavy components; everything shipped upfront + causes slow Time-to-Interactive even when the user only visits one route. +- Source maps or dev-only artifacts (`console.log`, debug builds, devDependency code) shipped to + production; `NODE_ENV` not set to `production` in the build pipeline. +- Duplicate dependencies (multiple versions of the same package bundled); audit with bundle analyzer + tools (verify against the currency brief for your version). +- Expensive module-level side effects executed at import time (global polyfills, eager DB connects, + heavy regex compilation), delaying first meaningful response. +- Missing minification, dead-code elimination, or modern target transpilation (e.g., shipping + over-polyfilled ES5 when the target supports ES2020+). +- Render-blocking scripts or stylesheets loaded synchronously; missing `<link rel="preload">` / + `<link rel="modulepreload">` for critical assets (verify against the currency brief for your + version). + +--- + +## Runtime notes (load for every JS/TS project) + +JS/TS runs on two single-threaded, JIT-compiled, garbage-collected engines — V8 in Node.js and the +browser's main thread — that share one cost model. These durable realities are the JS analog of a +"variant notes" section: *how the engine executes and how to measure it*, cutting across all the +lanes above and every module below. + +- **One main thread does everything**: in the browser the same thread runs JS, layout, paint, and + user input; in Node it serves every concurrent request. A long synchronous task (big loop, large + `JSON.parse`/`stringify`, sync crypto, complex regex) blocks *all* of it — jank in the browser, + stalled requests in Node. The durable fix is to keep the synchronous slice short: yield + (`setTimeout`/`queueMicrotask`/`scheduler.postTask`), stream, or offload to a Web Worker / + `worker_threads` (verify against the currency brief for your version). +- **V8 rewards stable object shapes (hidden classes)**: objects built with a consistent property set + and types stay monomorphic and on the JIT fast path; adding/deleting properties after construction, + mixing types in one field, or feeding a call site many shapes turns it polymorphic→megamorphic and + deoptimizes it. On hot paths prefer stable-shape objects (or `Map` for dynamic keys) and consistent + argument types; `delete obj.x` and sparse/holey arrays are classic deopts (verify against the + currency brief for your version). +- **Allocation churn drives GC pauses**: V8's generational GC collects short-lived garbage cheaply, + but per-frame / per-request allocation of objects, closures, and intermediate arrays + (`.map().filter()` chains) still adds up to measurable minor-GC time and main-thread jank — reuse + buffers, avoid needless spreads/clones on hot paths, and prefer `TypedArray`s for numeric-heavy + work (all JS numbers are float64 unless they fit V8's small-integer "SMI" fast path). +- **Forced synchronous layout / reflow (browser)**: interleaving DOM reads (`offsetWidth`, + `getBoundingClientRect`, `getComputedStyle`, `scrollTop`) with writes (style/class/DOM mutations) + inside a loop forces the engine to re-run layout on every read — "layout thrashing" that pegs the + main thread. Batch all reads, then all writes (or use `requestAnimationFrame` to schedule writes, + `IntersectionObserver`/`ResizeObserver` instead of polling geometry, and `content-visibility` / + `contain` to bound layout scope); frameworks mostly batch this for you, so look hardest in raw-DOM + or escape-hatch code (verify against the currency brief for your version). +- **Runtime and version are a lever**: V8 ships broad speedups by version, so the Node LTS line (even + majors = LTS; an odd/Current-only feature isn't adoptable on an LTS-bound project) and the target + browser engines matter; alternative runtimes (**Bun**, **Deno**) change the performance profile — + match the runtime to the workload rather than assuming stock Node (verify against the currency brief + for your version; see the version index's Support-cadence note). +- **Profile before optimizing — the tooling is first-class**: justify hot-path claims with Node + `--cpu-prof`/`--prof`, `clinic.js`/`0x` flame graphs, or the browser DevTools Performance panel, + `performance.now()`, and the framework profilers (React Profiler, Angular DevTools, Vue DevTools) — + not intuition. Main-thread long-task and Web Vitals (LCP/INP/CLS) instrumentation tells you whether + a render-path concern is actually reaching users. + +## Framework / sub-stack modules (load on detection) + +Load the core lanes + **Runtime notes** above for *every* JS/TS project. Additionally load the +matching module when its technology is *material* to the audit scope (not on an incidental import), +and include it as ecosystem context in the relevant lane prompts. These tech-specific lenses were +split out of this pack so a run pastes only what's relevant — see the version index +`../version-indexes/javascript-typescript.md` for version-specific facts. + +| Detected (signals) | Load module | +|---|---| +| **React** — `react`/`react-dom`, JSX in `*.jsx`/`*.tsx`, Next.js | [`javascript-typescript/react.md`](javascript-typescript/react.md) | +| **Angular** — `@angular/core`, `*.component.ts`, `angular.json` | [`javascript-typescript/angular.md`](javascript-typescript/angular.md) | +| **Vue** — `vue`, `*.vue` SFCs, Nuxt | [`javascript-typescript/vue.md`](javascript-typescript/vue.md) | +| **Node.js backend** — `express`, `fastify`, `@nestjs/*`, or a custom `http`/`https` server | [`javascript-typescript/node-backend.md`](javascript-typescript/node-backend.md) | +| **Node.js data layer** — `@prisma/client`, `typeorm`, `drizzle-orm`, `knex`, `sequelize`, `mongoose`, `pg`, `mysql2`, `ioredis` | [`javascript-typescript/node-data.md`](javascript-typescript/node-data.md) | +| **Bundling & build** — `vite`/`webpack`/`esbuild`/`rollup`/`turbopack` config, a `dist/` bundle, or a browser-targeted `package.json` | [`javascript-typescript/bundling-build.md`](javascript-typescript/bundling-build.md) | + +## Sources + +Durable signals in this pack are grounded in these authoritative sources (version-specific facts and +their per-entry citations live in `../version-indexes/javascript-typescript.md`): + +- **Runtime notes** — V8 blog (hidden classes / inline caches, GC, "shapes"); nodejs.org "Don't block the event loop"; web.dev Web Vitals (LCP/INP/CLS) + long-tasks; Node `--cpu-prof`/`clinic.js` docs. + +**Sub-stack modules** carry their own grounding; key sources per module: + +- **React** (`javascript-typescript/react.md`) — react.dev (memo, render-and-commit, `useMemo`/`useCallback`, `lazy`/Suspense, "You Might Not Need an Effect", React Compiler, Server Components). +- **Angular** (`javascript-typescript/angular.md`) — angular.dev (runtime-performance, signals, `OnPush`, `@defer`, built-in control flow, zoneless, hydration). +- **Vue** (`javascript-typescript/vue.md`) — vuejs.org (best-practices/performance, reactivity-in-depth, async components) + blog.vuejs.org (3.4/3.5 reactivity). +- **Node.js backend** (`javascript-typescript/node-backend.md`) — nodejs.org (event loop, `worker_threads`, `cluster`, streams/`pipeline`, undici); Fastify docs (`fast-json-stringify`); pino docs. +- **Node.js data layer** (`javascript-typescript/node-data.md`) — Prisma/TypeORM/Drizzle/Sequelize/Mongoose performance docs; node-postgres `Pool`; ioredis pipelining; `dataloader`. +- **Bundling & build** (`javascript-typescript/bundling-build.md`) — web.dev (tree-shaking, reduce-JavaScript-payloads); Vite/Rollup/webpack/esbuild docs; bundle-analyzer tooling; `browserslist`/`core-js`. diff --git a/.claude/skills/performance-audit/profile-packs/javascript-typescript/angular.md b/.claude/skills/performance-audit/profile-packs/javascript-typescript/angular.md new file mode 100644 index 00000000..98f5c9f7 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/javascript-typescript/angular.md @@ -0,0 +1,101 @@ +# JS/TS performance module: Angular +> Load when Angular (`@angular/core`, `*.component.ts`, `angular.json`) is detected — see the module map in `../javascript-typescript.md`. Core lanes + Runtime notes live in `../javascript-typescript.md`; this file is the Angular lens only. + +## Angular + +> Scope: Angular applications using Zone.js or the zoneless scheduler, any change-detection +> strategy, and the modern standalone/signal APIs. The recurring theme is **shrink the +> change-detection surface**: Zone.js triggers a tree-wide check on every async event by default, +> so the work compounds with component count. The corrective directions are `OnPush` + observable/ +> signal inputs to gate re-checks, signals for fine-grained push-based updates that skip whole +> subtrees, moving non-UI work outside the zone, and `@defer` / lazy routes to reduce what the +> browser bootstraps at all. + +- **Default `CheckAlways` strategy inflates re-check scope**: every async event (click, XHR, + timer, microtask) causes Angular to walk the entire component tree and re-evaluate all template + expressions in `CheckAlways` components. `OnPush` limits re-checks to: an `@Input` reference + changing, an `async`-pipe observable emitting, a signal notifying, or an explicit + `markForCheck()` call. Audit components that receive only immutable or observable data and have + no mutable local state — they are `OnPush` candidates. Leaving subtrees in `CheckAlways` means + a single button click re-checks dozens of unrelated components (verify against the currency + brief for your version). + +- **Signals (stable 17+) for fine-grained, push-based updates**: Zone.js-based `OnPush` still + re-checks the entire component on any notification; signals narrow the update to the specific + binding that read the signal. A `computed()` signal is only re-evaluated when its dependencies + change, making it the preferred replacement for getter calls and derived values read in + templates. Look for `@Input` properties or component state that changes at high frequency — if + the consuming template only reads one derived slice, a `computed` signal avoids re-evaluating + the whole template (verify against the currency brief for your version). + +- **Zone.js churn from third-party code or frequent microtasks**: Zone.js monkey-patches browser + async APIs and triggers a CD cycle on every resolution, including those from third-party + libraries, `requestAnimationFrame` loops, WebSocket message handlers, and micro-batched + timers. Look for high-frequency event sources (scroll, mousemove, WebSocket, rAF) hooked + directly inside the Angular zone — these schedule a CD check per event. `NgZone.runOutsideAngular` + moves the handler off the CD trigger path; UI updates can then be batched and applied with + `NgZone.run`. The **zoneless** scheduler (experimental ~18, targeted as default ~21) removes + Zone.js monkey-patching (~14 kB) entirely and relies on signals/explicit notification — + evaluate readiness of third-party dependencies before adopting (verify against the currency + brief for your version). + +- **Template expression cost — functions, getters, and impure pipes run every CD cycle**: Angular + evaluates every template expression on each change-detection pass for the component. A getter + method, a plain method call, or an impure pipe in the template therefore executes on every CD + cycle — not just on relevant data change. Move expensive derivations to `computed` signals + (evaluated lazily, cached until dependency changes), `async` pipe with an observable, or a + `pure` pipe (called only when the input reference changes). Impure pipes (marked + `pure: false`) re-run every cycle and should be rare and cheap; flag them as suspect when they + appear on lists or in tight loops (verify against the currency brief for your version). + +- **`@for` without `track` / `*ngFor` without `trackBy` on lists**: without a track expression, + Angular tears down and rebuilds the full list DOM on every data refresh — even when only one + item changed. The built-in `@for` block makes `track` mandatory (compiler error if omitted), + which is stricter than the optional `trackBy` on `*ngFor`. For lists that can be reordered, + `track item.id` (stable identity) is correct; `track $index` only avoids teardown when items + are appended/removed at the tail and never reordered — using it on reorderable lists causes + incorrect DOM reuse. Long lists (hundreds of items) need CDK virtual scroll regardless of + tracking strategy (cross-reference the **payload-startup** lane in `../javascript-typescript.md`; + verify against the currency brief for your version). + +- **Unsubscribed RxJS subscriptions and subscription anti-patterns**: a `subscribe()` call without + a corresponding teardown leaks the subscriber for the component's lifetime and beyond, keeping + component references alive after destroy. Prefer `async` pipe (auto-unsubscribes on destroy) + or `takeUntilDestroyed()` (verify against the currency brief for your version). Nested + `subscribe()` inside `subscribe()` creates interleaved, un-cancellable streams — flatten with + `switchMap`/`mergeMap`/`concatMap`. Look also for `shareReplay` without `refCount: true` on + shared streams: without reference counting the source never completes and all subscribers stay + alive (verify against the currency brief for your version). + +- **Large eager feature modules and components — `@defer` and lazy routes**: feature modules or + standalone components registered in the root module or the initial route's import list are + bundled in and bootstrapped eagerly, bloating Time-to-Interactive. `@defer` (stable 17+) lets + templates defer a component subtree until a trigger fires (`viewport`, `idle`, `interaction`, + `hover`, `timer`, or `prefetch when`); use `@defer (on viewport)` for below-the-fold sections + and `@defer (on idle; prefetch on hover)` for heavy widgets. Lazy router routes with + `loadComponent` (standalone) or `loadChildren` achieve the same for route-level splits. + Standalone components are tree-shaking-friendly compared to NgModule-declared components whose + dependency graph is harder for bundlers to statically analyse (cross-reference the + `bundling-build` module and the **payload-startup** lane in `../javascript-typescript.md`; + verify against the currency brief for your version). + +- **SSR hydration — double-fetch and full rerender on hydration**: Angular SSR with non-destructive + hydration (stable 17+) reuses server-rendered DOM instead of discarding it, which cuts + First-Contentful-Paint cost. Look for `provideClientHydration()` absence in the app config — + without it Angular bootstraps by destroying and recreating the server DOM. Also check for HTTP + requests made during SSR that are repeated on the client: Angular's `HttpClient` transfer + state caches server responses and replays them to the browser; bypassing it (e.g., using + native `fetch` or forgetting `withHttpTransferCache()`) causes a visible double-fetch waterfall. + Pair with `@defer (prefetch on idle)` for below-the-fold sections to avoid hydrating content + the user may never interact with (verify against the currency brief for your version). + +- **Heavy `APP_INITIALIZER`, eager service instantiation, and expensive constructors**: services + provided in root or in an eagerly-loaded module are constructed at bootstrap, before the first + frame. `APP_INITIALIZER` tokens that make blocking HTTP calls, load config, or perform + expensive computation delay the bootstrap promise and push out Time-to-Interactive. Look for + `APP_INITIALIZER` functions that await multiple sequential operations — parallelize with + `Promise.all` where order permits, or defer non-critical work to an `APP_BOOTSTRAP_LISTENER`. + Widely-instantiated components (list rows, table cells) with expensive constructors or + injected heavy services compound this cost at runtime each time the list refreshes + (cross-reference the **payload-startup** lane in `../javascript-typescript.md`; verify against + the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/javascript-typescript/bundling-build.md b/.claude/skills/performance-audit/profile-packs/javascript-typescript/bundling-build.md new file mode 100644 index 00000000..455fea65 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/javascript-typescript/bundling-build.md @@ -0,0 +1,116 @@ +# JS/TS performance module: Bundling & build (Vite / webpack / esbuild / Rollup) +> Load when a frontend build is detected (`vite`/`webpack`/`esbuild`/`rollup`/`turbopack` config, a `dist/` bundle, or a browser-targeted `package.json`) — see the module map in `../javascript-typescript.md`. Core lanes + Runtime notes live in `../javascript-typescript.md`; this file is the Bundling & build lens only. + +## Bundling & build (Vite / webpack / esbuild / Rollup) + +> Scope: the mechanics of what ends up in the shipped bundle and why — tree-shaking failure modes, +> code-splitting strategy, transpilation target accuracy, heavy-dependency cost, CSS weight, asset +> handling, and build-pipeline throughput. The recurring theme is: **ship less JS, split by route, +> tree-shake real dead code, target the right ES version, and measure before optimising** — a bundle +> analyser is the starting point, not a checklist. Quick-hits (named imports, missing lazy-loading, +> `NODE_ENV`, duplicate deps, missing minification, render-blocking scripts) are covered in the core +> **payload/startup/build** lane in `../javascript-typescript.md`; this file goes deeper into the +> bundler mechanics behind each. + +- **CommonJS deps block tree-shaking entirely**: ES module tree-shaking requires static `import`/`export` + syntax — bundlers (Rollup, Vite, webpack 5, esbuild) cannot eliminate dead exports from a CommonJS + module because `require()` is dynamic and `module.exports` is a runtime value. When a dependency + publishes only a CJS build, the entire package is included regardless of what the consumer imports. + Look for packages that lack an `"exports"` map with an `"import"` (ESM) condition or a `"module"` + field in `package.json`; check whether the bundler's resolution is picking up the CJS entrypoint + — tools like `rollup-plugin-visualizer` or `webpack-bundle-analyzer` will show the full blob rather + than individual exports. Prefer ESM-native alternatives or the package's explicit ESM build (e.g., + `lodash-es` over `lodash`) where the cost matters (cross-reference the **payload/startup/build** + lane in `../javascript-typescript.md`; verify against the currency brief for your version). + +- **`"sideEffects"` missing or wrong in `package.json` prevents dead-code elimination**: bundlers + that support the `"sideEffects"` field use it to decide whether an imported-but-unused module can + be dropped entirely. Without it (or when set to `true`), every imported file is retained even if + nothing is used from it, because the bundler must assume the `import` has observable side effects. + The failure modes are symmetric: a library that omits the field keeps unused modules in the bundle; + a library that sets `"sideEffects": false` incorrectly (e.g., a CSS import or a global polyfill + that actually mutates the environment) will be silently dropped, causing runtime errors. Look for + packages with no `"sideEffects"` key whose contribution shows up as unexpectedly large in a bundle + report, and for first-party code that imports CSS or polyfills via side-effect-only imports that + must be listed as exceptions (e.g., `["*.css", "./src/polyfills.js"]`) (verify against the currency + brief for your version). + +- **Barrel files defeat tree-shaking and slow builds**: an `index.ts` that re-exports every module + in a directory (barrel export pattern) forces the bundler to load, parse, and analyse every file + in that barrel to determine which exports are live — even when the consumer only imports one + symbol. This creates two costs: (1) graph-time: the bundler must crawl the entire re-export chain + before it can mark dead code, slowing incremental builds as the barrel grows; (2) tree-shaking + accuracy: if any re-exported module has side effects the bundler cannot statically prove away, the + whole barrel is retained. Look for `index.ts` files with tens of `export * from '…'` or `export { + X } from '…'` lines at the component/feature directory level; in monorepos this pattern can make + every internal package import pull in an entire sub-tree. Deep path imports (`import { Button } + from '@ui/components/Button'` instead of `import { Button } from '@ui'`) bypass the barrel and + unlock per-file dead-code elimination (cross-reference the **payload/startup/build** lane for + named-import guidance; verify against the currency brief for your version). + +- **Over-splitting causes request waterfalls; under-splitting ships everything**: dynamic `import()` + creates a chunk boundary, but the optimal granularity is route-level or feature-level — not + per-component. Too many tiny chunks means the browser must fire sequential requests to resolve a + module graph at runtime (a waterfall), erasing the latency win of splitting; too few chunks means + a user visiting one route downloads the code for all others. Look for: shared utilities or vendor + libraries duplicated across multiple chunks (each chunk bundled its own copy instead of sharing + one via `splitChunks` / Rollup's `manualChunks`); overly granular splitting (many < 5 kB chunks + behind a single route); or a single monolithic vendor chunk containing libraries used on only one + route. The right model is large shared chunks for truly shared code, plus per-route chunks for + route-specific code; `<link rel="modulepreload">` for critical next-route chunks eliminates the + perceived waterfall on predictable navigations (cross-reference `React.lazy`/`defineAsyncComponent` + / Angular `@defer` notes in the `react`, `vue`, `angular` modules; verify against the currency + brief for your version). + +- **Heavy dependency pulled for one function**: large libraries with no tree-shakable ESM build + impose their full weight on the bundle regardless of usage. The canonical example is `moment.js` + (~300 kB minified + locale data), which bundles all locale files by default and cannot be + tree-shaken because it is CommonJS; the alternatives `date-fns` (ESM, per-function imports), + `dayjs` (~2 kB), or the platform `Temporal` API carry a fraction of the cost for equivalent + functionality. The pattern generalises: a large icon library imported as `import { IconA } from + '@icons/all'`, a full i18n locale bundle, or a complete polyfill suite pulled in for one method + all show up as the same failure mode in a bundle analyser — a large blob disproportionate to the + feature surface used. Run `rollup-plugin-visualizer` or `webpack-bundle-analyzer` and sort by + size; flag any dependency where the used surface is clearly a small fraction of the included + weight (verify against the currency brief for your version). + +- **Transpilation target too broad inflates payload and polyfill cost**: shipping ES5-compatible + output when the audience is modern browsers forces the transpiler to emit verbose helper code for + every class, arrow function, optional chain, and destructure. `@babel/runtime` helper deduplication + (`@babel/plugin-transform-runtime`) avoids per-file inline copies, but the helpers themselves + still add weight. `core-js` polyfills are the larger risk: `useBuiltIns: 'entry'` with a broad + `browserslist` can inject tens of kB of polyfills for browser features the target already supports + natively. Look for: a `browserslist` query like `"> 0.5%, last 2 versions"` that includes legacy + IE or Android 4; `core-js` appearing as a large chunk in the bundle report; Babel in the critical + build path when esbuild or SWC (5–20× faster) would meet the same target. Differential serving + (a modern `<script type="module">` build + a legacy `<script nomodule>` fallback) is an option + where IE11 or Android legacy support is genuinely required but modern users must not pay the + penalty (cross-reference the `tslib` / `importHelpers` note for TypeScript codebases; verify + against the currency brief for your version). + +- **Unoptimised CSS weight and render-blocking style**: utility CSS frameworks (Tailwind, UnoCSS, + Windi) ship near-zero unused CSS when purging is configured correctly, but if the content paths + (`content` / `purge` array) miss source files, entire utility sets are included. CSS-in-JS + runtimes (emotion, styled-components, runtime `@emotion/css`) evaluate and inject styles at + JavaScript runtime, adding both bundle weight (the runtime) and a style-injection cost per render + that pure static CSS avoids; zero-runtime CSS-in-JS alternatives (vanilla-extract, Linaria, Panda + CSS) or utility frameworks move this cost to build time. Missing critical-CSS extraction means the + browser must download and parse the full stylesheet before rendering above-the-fold content — look + for large, non-inlined stylesheets linked in `<head>` without `media` queries deferring off-screen + styles. These are distinct failure modes: purge misconfiguration ≈ raw payload; runtime CSS-in-JS + ≈ JS bundle + render cost; blocking CSS ≈ render latency even if the file is small (cross-reference + the render-blocking note in the **payload/startup/build** lane of `../javascript-typescript.md`; + verify against the currency brief for your version). + +- **Slow builds from type-checking in the bundler hot path and absent caching**: TypeScript + type-checking during the bundler's transform step (`ts-loader` in full-type-check mode, `vite` + with `vite-plugin-checker` on every save) blocks the hot-module-replacement pipeline on the + slowest part of the TypeScript toolchain. The standard split is: bundler handles transpile-only + transforms (esbuild/SWC strip types without checking, making HMR near-instant) while `tsc + --noEmit` runs type checking separately in CI or as a parallel watcher. Separately, missing + persistent caching (`cache: true` in webpack 5 filesystem cache, Vite's pre-bundling cache, + Turborepo/Nx task caching for monorepos) means full rebuilds from scratch on every CI run or + fresh container. Look for: `ts-loader` without `transpileOnly: true`; no `cache` section in + `webpack.config`; CI steps that never restore a build cache; barrel imports (see above) that force + large graph re-analysis on each incremental build (verify against the currency brief for your + version). diff --git a/.claude/skills/performance-audit/profile-packs/javascript-typescript/node-backend.md b/.claude/skills/performance-audit/profile-packs/javascript-typescript/node-backend.md new file mode 100644 index 00000000..8ba162ef --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/javascript-typescript/node-backend.md @@ -0,0 +1,22 @@ +# JS/TS performance module: Node.js backend (Express / Fastify / NestJS) +> Load when a Node.js server (`express`, `fastify`, `@nestjs/*`, or a custom `http`/`https` server) is detected — see the module map in `../javascript-typescript.md`. Core lanes + Runtime notes live in `../javascript-typescript.md`; this file is the Node.js backend lens only. + +## Node.js backend (Express / Fastify / NestJS) + +> Scope: the runtime mechanics of Node.js HTTP servers and the three dominant frameworks — Express, Fastify, and NestJS. The recurring theme is that a single event loop serves every in-flight request: one slow or poorly-wired handler stalls all concurrent traffic, outbound connection setup is paid per-request unless pooled, buffering a response delays the first byte and spikes memory, and every middleware or serialization step on the hot path compounds under load. Core async/event-loop basics (sequential `await`→`Promise.all`, Worker Thread offload, uncontrolled concurrency, N+1/over-fetch) are covered in the **Concurrency** and **Data access** lanes of `../javascript-typescript.md`; the bullets below are the server-mechanics layer that sits on top. + +- **Single process ↔ single core; CPU-bound handlers stall every in-flight request**: Node runs one JS thread per process, so a handler that burns >~1 ms of synchronous CPU (complex template rendering, large `Array.sort`, regex on big input, image manipulation) occupies the event loop for *all* concurrent requests during that window — tail latency spikes become total throughput stalls. The remedy is not `Promise`-wrapping (which doesn't move work off the loop) but either `worker_threads` for JS-heavy CPU work or horizontal scaling via the `cluster` module / process manager (PM2, systemd) to spread requests across cores. Look for CPU-intensive work called directly inside route handlers with no offload path (cross-reference the **Concurrency** lane in `../javascript-typescript.md` for Worker pool reuse; verify against the currency brief for your version). + +- **Outbound HTTP client created per request**: constructing a new `http.Agent`, `axios` instance, or `undici` `Pool`/`Client` inside a handler opens a fresh TCP/TLS connection on every call; the cost (~10–100 ms) shows up as added latency on every downstream call and connection-count pressure on the target. A single module-level client with `keepAlive: true` and appropriate `maxSockets` reuses idle connections. For high-throughput fan-out, an `undici` `Pool` or `Agent` with explicit connection limits outperforms the default global `fetch` pool — look for client construction inside handler or middleware function bodies rather than at module scope (verify against the currency brief for your version). + +- **Buffering the full response before writing**: handlers that assemble an entire large payload (`JSON.stringify(hugeArray)`, `fs.readFileSync`, accumulate-then-`res.send`) block the event loop during serialization and delay the first byte until the whole payload is ready. `res` is a writable stream; pipe a `Readable` directly into it with `stream.pipeline()` to handle backpressure correctly, use `res.write()` + `res.end()` for chunked output, or stream the DB cursor row-by-row. `JSON.stringify` on a large object is a synchronous, blocking call — consider streaming JSON serializers for payloads that routinely exceed a few hundred KB (cross-reference the **Memory** lane in `../javascript-typescript.md`; verify streaming serializer options against the currency brief for your version). + +- **Fastify response schemas missing or Express used where throughput matters**: Express calls `JSON.stringify` dynamically on every response; Fastify with a declared `schema.response` compiles a serializer via `fast-json-stringify` at startup that is 2–5× faster and skips `JSON.stringify` entirely on the hot path. NestJS on Fastify adapter inherits this benefit only when serialization schemas are wired through; on the Express adapter it does not. Look for high-request-rate Fastify routes without `schema.response`, or micro-services where Express was chosen without evaluating the Fastify adapter trade-off (verify against the currency brief for your version). + +- **Middleware applied to routes that don't need it**: Express processes every registered middleware in insertion order for every matched route — `bodyParser.json()`, `morgan`, authentication, and validation pipes mounted at the app root run even on `/healthz`, metrics endpoints, and routes that receive no body. In NestJS, global `ValidationPipe` with `transform: true` triggers `class-transformer` reflection on every incoming DTO, including internal probes. In Fastify, plugins registered globally with `fastify.addHook('preHandler', ...)` add overhead to every request. Audit middleware registration scope: mount body parsing, logging, and validation only on the route groups that require them, and short-circuit lightweight routes before expensive middleware (cross-reference the **Concurrency** lane in `../javascript-typescript.md`). + +- **`console.log` or synchronous loggers on the hot request path**: `console.log`/`console.error` format and `util.inspect` their arguments eagerly on every call and write unbuffered — and `process.stdout`/`stderr` writes are *synchronous* when the destination is a file or a TTY (and can still stall under backpressure when it is a pipe), so verbose per-request logging at `INFO`/`DEBUG` on high-RPS routes adds event-loop latency regardless of destination. The standard mitigation is a buffered, structured logger (`pino` defers formatting and can route output through a transport on a worker thread) at `WARN`/`ERROR` in production, with request-level logging enabled only on demand. Look for `console.log`/`console.error` inside route handlers or middleware, or log transports that flush synchronously on every write (verify transport defaults against the currency brief for your version). + +- **Blocking sync APIs inside handlers**: `fs.readFileSync`, `crypto.pbkdf2Sync`/`crypto.scryptSync`, `child_process.execSync`, and `require()` of a heavy module inside a handler body all block the event loop for their full duration — they are safe at startup but stall every concurrent request if called on the hot path. `require()` in particular is cached after the first call but the cache miss (cold start or dynamic `require(\`./plugins/${name}\`)`) is a full disk read and module evaluation. Check for these in handler, middleware, or service-method bodies rather than at module initialisation scope. + +- **Memory leaks from per-request accumulation**: in-process request caches (e.g., a `Map` populated per user session with no TTL or size cap), event listeners added inside handlers with `emitter.on(...)` but never removed, and closures that capture large request objects in long-lived callbacks all grow unboundedly with traffic. `MaxListenersExceededWarning` in logs is a strong signal of listener accumulation. Without `--max-old-space-size` set, Node defaults to a V8 heap limit that can be well below the container memory ceiling — the process OOM-kills before the scheduler can intervene. Graceful shutdown (`SIGTERM` → drain in-flight requests → close server → exit) is also a memory-correctness concern: abrupt exits under load leave pooled connections and write buffers unfinished (cross-reference the **Memory** lane in `../javascript-typescript.md`). diff --git a/.claude/skills/performance-audit/profile-packs/javascript-typescript/node-data.md b/.claude/skills/performance-audit/profile-packs/javascript-typescript/node-data.md new file mode 100644 index 00000000..edf6d198 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/javascript-typescript/node-data.md @@ -0,0 +1,101 @@ +# JS/TS performance module: Node.js data layer (Prisma / TypeORM / Drizzle / Knex / Mongoose) +> Load when a Node data layer (`@prisma/client`, `typeorm`, `drizzle-orm`, `knex`, `sequelize`, `mongoose`, `pg`, `mysql2`, `ioredis`) is detected — see the module map in `../javascript-typescript.md`. Core lanes + Runtime notes live in `../javascript-typescript.md`; this file is the Node.js data layer lens only. + +## Node.js data layer (Prisma / TypeORM / Drizzle / Knex / Mongoose) + +> Scope: all patterns that touch `pg.Pool`, `mysql2` connection pools, ORM connection config, +> Mongoose connections, or the `ioredis` client. The recurring themes are: **share the pool** (one +> shared pool instance, not one per request), **batch to cut round-trips** (N+1 is the dominant +> latency killer at every ORM layer), **project and `.lean()` what you read** (hydration and +> over-fetch inflate memory and latency on read-heavy paths), and **read the generated query** (the +> ORM abstracts the SQL — `EXPLAIN` or ORM query logging is the only way to confirm cost before +> diagnosing). Cross-reference the core **Data access & I/O** lane for generic N+1/over-fetch/bulk +> basics, and the `node-backend` module for event-loop and concurrency interactions. + +- **Pool opened per request instead of shared at module scope**: `pg.Pool`, `mysql2.createPool`, + and Mongoose/TypeORM/Prisma connections are designed to be constructed once at startup and shared + for the process lifetime. Constructing a new pool (or calling `$connect()` / `createConnection`) + inside a request handler pays TCP + TLS + auth overhead on every call, bypasses pool reuse + entirely, and leaks connections when `end()`/`destroy()` is omitted on error paths. Look for pool + or client construction inside route handlers, middleware, or Lambda handlers (cross-reference the + **Concurrency** lane for the goroutine/async-task leak analogue) (verify against the currency + brief for your version). + +- **Pool defaults left unconfigured under load — exhaustion or idle churn**: `pg.Pool` defaults + (`max: 10`, no `idleTimeoutMillis` or `connectionTimeoutMillis`) and ORM equivalents (Prisma + `connection_limit`, TypeORM `extra.max`, Sequelize `pool.max`) are conservative baselines that + saturate quickly under moderate concurrency. A pool that is too small queues requests; one with + no idle timeout churns TCP handshakes on every cold slot. Look for pools whose `max` is never + set explicitly, for missing `idleTimeoutMillis` (connections held until NAT/LB kills them), and + for missing `connectionTimeoutMillis` (requests block indefinitely when the pool is dry). Set + all relevant parameters explicitly and verify they match the database's `max_connections` budget + (verify against the currency brief for your version). + +- **Serverless connection storms — new pool per invocation without a proxy**: in Lambda/Cloud + Functions the process is short-lived, so a cold invocation opens a fresh database connection. + Under burst concurrency, hundreds of invocations open hundreds of connections simultaneously — + the database `max_connections` ceiling is hit long before CPU is a constraint. Look for direct + `pg.Pool`/Prisma/TypeORM connections inside serverless handlers with no RDS Proxy, PgBouncer, or + Prisma Data Proxy in front; look also for Prisma's default `connection_limit` (which sizes to CPU + count and can be far too high in a many-replica serverless fleet). The fix is a connection pooler + that multiplexes, not code that limits pool size alone (verify against the currency brief for + your version). + +- **N+1 from ORM relation loading beyond generic eager/lazy**: Prisma's `findMany` without + `include` is safe, but calling `findMany` (or `findUnique` for each parent's ID) *inside a loop* + is N+1 invisible to the ORM — look for `prisma.*.find*` calls nested inside `for`/`map` over a + result set. TypeORM lazy relations (`@OneToMany` with `lazy: true`) fire a database query on + property *access*; if the entity is accessed in a loop the relation resolves N times — the + symptom is deferred async queries after the initial load. Mongoose `populate()` issues a *second* + query per populated path; chaining `.populate('a').populate('b')` produces two extra queries per + document, and calling `populate` inside a `for` loop of documents is N×paths queries. Use + `dataloader`-style batching for GraphQL resolvers that call any ORM per-node (cross-reference + the core **Data access & I/O** N+1 bullet). + +- **Over-fetching and missing projection / `.lean()`**: Prisma exposes `select` and `omit` to + project only needed fields at the query level — a `findMany` with no `select` on a wide table + deserialises every column. TypeORM `find` with no `select` option does the same. Mongoose + `.find()` without a projection (second argument or `.select(…)`) returns full BSON documents; + chaining `.lean()` returns plain JavaScript objects, skipping the full Mongoose document + hydration (virtuals, method attachment, change-tracking overhead) — on read-heavy paths with + large result sets this is a large, low-risk speedup. Flag any Mongoose read path that is not + followed by `.lean()` when the result is not modified before response (verify against the + currency brief for your version). + +- **Query shape hidden by the ORM — missing indexes, deep `OFFSET`, costly `count`**: the ORM + emits SQL (or a query plan) the developer may never see. Filtering or sorting on unindexed + columns, `skip(N)` / `OFFSET N` deep pagination (scans and discards N rows — replace with + keyset pagination anchored on the last seen cursor value), and `count()` on large Mongo + collections or SQL tables can each dominate latency while appearing as a single ORM call. + Diagnostic path: enable Prisma query logging (`log: ['query']`), TypeORM `logging: true`, or + Mongoose `mongoose.set('debug', true)`; then run `EXPLAIN ANALYZE` (Postgres/MySQL) or + `cursor.explain('executionStats')` (MongoDB) on the emitted query. Push the audit to read the + actual query before inferring cost. Use `$queryRaw` / `createQueryBuilder` / raw aggregation + pipelines as the escape hatch for hot queries the ORM cannot express efficiently (verify against + the currency brief for your version). + +- **Bulk writes as per-row inserts in a loop**: inserting or updating rows one at a time — a + `prisma.*.create(…)` or `Model.save()` or `repository.save(entity)` in a `for` loop — pays one + round-trip and one statement parse per row. Replace with `prisma.*.createMany` / Sequelize + `bulkCreate` / Mongoose `Model.insertMany` / TypeORM `repository.insert([…])` for inserts, and + `prisma.$transaction([…writes])` to batch heterogeneous mutations in a single round-trip. For + Redis, replace per-key `set`/`get` calls in a loop with `ioredis` `pipeline()` (fire-and-forget + pipelining) or `mget`/`mset` (verify against the currency brief for your version). + +- **Mongoose schema-level middleware and virtuals on large result sets**: Mongoose `pre`/`post` + hooks (`save`, `find`, `findOne`) and virtuals run per-document on hydrated results. A `find` + that returns 500 documents with three `post` hooks and two virtuals executes those callbacks + 2 500 times — visible when profiling as synchronous JS CPU time proportional to result-set size, + not query latency. Look for `findMany`-style queries with no `lean()` that also have schema-level + middleware on the model; `.lean()` bypasses hooks and virtuals entirely and is the correct choice + when mutation or virtual access is not needed post-query (cross-reference the over-fetching + bullet above). + +- **ioredis per-call round-trips and per-request client construction**: each `client.get(key)` + incurs a full TCP round-trip; a handler that calls `get` or `set` five times in sequence pays + five serial round-trips. `pipeline()` enqueues multiple commands and sends them in one write, so + the server processes and responds in a single round-trip; `multi()` wraps them in a MULTI/EXEC + transaction when atomicity is needed. Also look for a new `new Redis(…)` constructed inside the + request handler — ioredis connections should be a single shared module-level client (or a small + cluster client) for the process lifetime, not a per-request socket (verify against the currency + brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/javascript-typescript/react.md b/.claude/skills/performance-audit/profile-packs/javascript-typescript/react.md new file mode 100644 index 00000000..13cbef72 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/javascript-typescript/react.md @@ -0,0 +1,109 @@ +# JS/TS performance module: React +> Load when React (`react`/`react-dom`, `*.jsx`/`*.tsx` with JSX, Next.js) is detected — see the module map in `../javascript-typescript.md`. Core lanes + Runtime notes live in `../javascript-typescript.md`; this file is the React lens only. + +## React + +> Scope: React component trees and their host environments (browser, SSR, RSC). The recurring theme +> is **minimising re-render scope and work-per-render** — keep references stable so memoization +> actually holds, move expensive computation off the render path, and move work off the client +> entirely where Server Components or SSR can absorb it. + +- **Re-render cascade from unstable inline props**: a parent re-render re-renders every unmemoized + child; an inline object, array, or function literal (`style={{ color }}`, `onClick={() => …}`) + creates a new reference each render, breaking `React.memo`'s shallow comparison and voiding any + memoization downstream. Look for JSX attributes that construct values — object literals, array + literals, arrow functions — at the call site rather than in a stable variable, `useMemo`, or + `useCallback`. The **React Compiler** (React 19-era) auto-memoizes these when it can prove + stability, making manual `useMemo`/`useCallback` largely redundant in compiler-enabled codebases; + flag manual memoization the compiler now handles as clutter, and flag *missing* memoization in + codebases that have not adopted the compiler where child re-render cost is measurable (verify + against the currency brief for your version). + +- **`React.memo` misuse — absent where it helps, present where it doesn't**: a pure component that + receives stable props but sits under a frequently-updating parent is a candidate for `React.memo`; + its absence means the component always re-renders even when its output cannot change. The inverse + is equally worth flagging: wrapping a component whose props nearly always differ (e.g., receives a + new object each render from a non-memoized parent) adds a shallow-comparison cost with no + memoization benefit — the memo wrapper just burns cycles on the comparison. Look for the asymmetry + between how often props actually change and whether the wrapper is present (verify against the + currency brief for your version; cross-reference the **Algorithmic** lane in + `../javascript-typescript.md`). + +- **Context re-render fan-out**: every consumer of a context re-renders when the context value + reference changes; a context whose value is an object literal recreated each render (`value={{ user, + dispatch }}`) re-renders all consumers on every parent render regardless of whether the consumed + slice changed. Look for: single monolithic contexts holding both stable config and high-churn + state; object or array values that are not stabilized with `useMemo`; consumers that only read one + field of a multi-field context. The fix space is: split contexts by update frequency, stabilize + the value reference, or move high-churn state to an external store with `useSyncExternalStore` or + a selector-based library (Zustand, Redux Toolkit selectors) that lets components subscribe to + a narrow slice (verify against the currency brief for your version). + +- **Expensive work in the render body**: computation run directly in the function body (not wrapped + in `useMemo`) re-executes on every render triggered by any state or prop change, even unrelated + ones. Look for: large array transforms (sort, filter, reduce) over props or state; heavy object + construction; regex execution over long strings; tree-traversal — all inline in the component + body. The condition to flag is unstable inputs combined with expensive work; `useMemo` with a + precise dependency array defers recomputation to actual input changes. Also check effects used + purely to derive state: `useEffect` that reads state A and `setState(derive(A))` is a + double-render pattern — derive the value during render instead ("You Might Not Need an Effect") + (cross-reference the **Algorithmic** lane in `../javascript-typescript.md`). + +- **Effect-driven re-subscribe and dependency churn**: `useEffect` hooks whose dependency arrays + contain unstable references (inline objects, functions, derived arrays) re-fire on every render + even when the logical dependency has not changed, creating re-subscribe loops for subscriptions, + timers, or data-fetch chains. Look for: effects whose `deps` include values computed inline or + passed as props without stabilization; effects that set state unconditionally (triggering another + render → another effect fire); data-fetching effects that chain (`fetchA → setState → fetchB in + another effect`), creating sequential waterfalls the framework's data layer or a single async + function would eliminate. Cross-reference "You Might Not Need an Effect" for the derived-state + pattern and the **Data access & I/O** lane in `../javascript-typescript.md` for the fetch-waterfall + pattern. + +- **Concurrent feature gaps — `useTransition` and `useDeferredValue`**: CPU-heavy state updates + (filtering a large list, re-rendering a large tree) that run synchronously block user input and + produce jank; wrapping the expensive update in `startTransition` or `useTransition` marks it + non-urgent so React can interrupt it in favor of user input. Look for: event handlers that both + update fast-response UI (input value) and trigger expensive derived renders in the same + synchronous path. Separately, `useDeferredValue` lets a display value lag behind a fast-updating + source (e.g., showing the previous filtered list while the new filter renders), eliminating + per-keystroke jank without debounce gymnastics. Missing Suspense boundaries block progressive and + streaming SSR rendering — every data-fetching or lazy-loaded subtree that could independently + suspend should be wrapped so the rest of the tree can render without it (verify against the + currency brief for your version). + +- **State structure causing unnecessary breadth**: over-broad state — storing derived values + alongside source, duplicating state across siblings, lifting state higher than the deepest common + ancestor that needs it — causes more components to re-render than logically necessary. Look for: + `useState` holding values computable from other state or props (should be derived during render or + via `useMemo`); state lifted to a top-level provider when only a local subtree cares; uncontrolled + input patterns that update a shared store on every keystroke, re-rendering a large tree per + character (local state + debounced sync, or `useDeferredValue`, bounds this). Also flag + index-as-key on reorderable or filterable lists: React uses the key to decide whether to reuse a + component instance, so an index key on a reordering list forces full remount and DOM teardown of + every shifted item; long lists with stable-but-numerous items need virtualization + (react-window / TanStack Virtual) rather than rendering all nodes into the DOM (verify against + the currency brief for your version; cross-reference the **Memory** lane in + `../javascript-typescript.md`). + +- **Heavy component patterns — inline definitions and missing lazy-loading**: defining a component + function inside another component's render body creates a new function reference and a new React + component *type* on every parent render; React sees a different type and unmounts+remounts the + entire subtree rather than reconciling it — look for function components declared with `function` + or arrow syntax inside another component's body. Separately, heavy components (charts, rich-text + editors, large third-party widgets) rendered unconditionally at mount, even when off-screen or + conditionally shown, pay their parse and init cost on every page load; `React.lazy` + Suspense + defers that cost to first use (cross-reference the **Payload / startup / build** lane and the + `bundling-build` module in `../javascript-typescript.md`). + +- **SSR / RSC and hydration cost**: in Next.js App Router and similar RSC runtimes, marking a + component `"use client"` ships its module and all its imports to the browser bundle; overuse + converts what could be zero-JS Server Components into client-side JavaScript, inflating + Time-to-Interactive. Look for: `"use client"` applied to large subtrees or layout components + where only a small leaf needs interactivity; data fetching done client-side (useEffect + fetch) + that could run on the server; heavy third-party imports pulled into client components. Hydration + itself has a cost proportional to the amount of server-rendered HTML being reconciled on the + client — look for large server-rendered trees where selective or progressive hydration strategies + (lazy hydration, islands) would reduce main-thread work at startup (verify against the currency + brief for your version; cross-reference the **Payload / startup / build** lane and the + `bundling-build` module). diff --git a/.claude/skills/performance-audit/profile-packs/javascript-typescript/vue.md b/.claude/skills/performance-audit/profile-packs/javascript-typescript/vue.md new file mode 100644 index 00000000..1c8ef608 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/javascript-typescript/vue.md @@ -0,0 +1,94 @@ +# JS/TS performance module: Vue +> Load when Vue (`vue`, `*.vue` SFCs, Nuxt) is detected — see the module map in `../javascript-typescript.md`. Core lanes + Runtime notes live in `../javascript-typescript.md`; this file is the Vue lens only. + +## Vue + +> Scope: Vue 3 with the Composition API and `<script setup>`, including Nuxt SSR/SSG deployments. +> The recurring performance theme is four levers applied together: **bound reactivity granularity** +> (don't let Vue proxy-wrap data that never needs to drive the DOM), **cache derived values** +> (computed over methods; debounced/narrow watchers), **skip diffing static and stable subtrees** +> (v-once, v-memo, stable keys), and **lazy/split the bundle** (async components, route splitting, +> lazy hydration). When a signal in one bullet implicates bundle size or build output, also consult +> the `bundling-build` module and the `payload-startup` lane in `../javascript-typescript.md`. + +- **Reactivity granularity — deep `reactive`/`ref` on large structures**: Vue 3 wraps every nested + property in a Proxy, so a single large `reactive({})` tree pays an O(n) cost at setup and on + deep mutations. If only a small slice of the object ever drives the DOM, the rest of the proxy + machinery is pure overhead. `shallowRef`/`shallowReactive` make only the top-level reference + reactive; `markRaw` opts an object out of reactivity entirely — use it for third-party class + instances, large lookup tables, or canvas/WebGL objects attached to component state. Vue 3.5 + included a reactivity rewrite reported to reduce memory and improve large-array performance; + the durably correct framing is to keep reactive trees as narrow as possible regardless of + runtime version (verify against the currency brief for your version). + +- **Computed vs methods vs watchers — caching and dependency scope**: `computed` properties cache + their result and only recompute when a tracked reactive dependency changes, so a template + reading a computed ten times in one render pays the derivation cost once. A method called in the + template recomputes on every render regardless of input stability — the footgun is using a method + where the value is truly derived from reactive state and doesn't need to be called with arguments. + `watchEffect` collects all reactive reads at runtime (easy to write, easy to over-read); explicit + `watch` with a narrow source expression limits the dependency surface and makes the trigger + condition auditable. Either form doing heavy synchronous work on every change should be debounced + or restructured to narrow what triggers it (cross-reference the `concurrency` lane in + `../javascript-typescript.md`). + +- **Template diffing — `v-once`, `v-memo`, and `v-if`/`v-for` placement**: `v-once` renders a + subtree once and skips it in all future patch cycles — correct for content that is truly static + after mount (legal text, static imagery, translated labels that don't change). `v-memo` accepts a + dependency array and skips a subtree's diff when every value in the array is the same as the + last render; for list rows keyed on stable identifiers with infrequently changing display fields, + this can eliminate O(n) diffing under a frequent parent update. Placing `v-if` and `v-for` on the + same element forces Vue to evaluate the condition for every item before deciding whether to render; + wrap with a `<template>` tag to separate the two (verify against the currency brief for your + version). + +- **List rendering — `key` correctness and virtualization**: `:key` set to array index on a + reorderable, filterable, or pageable list causes Vue to patch the wrong DOM nodes and re-render + rows that haven't changed — use a stable domain identifier. For large lists (hundreds of rows + or more), virtual scrolling (e.g., `vue-virtual-scroller`) renders only the visible viewport + slice, keeping DOM node count bounded and eliminating O(n) mount/unmount costs on filter changes. + The combination of index keys and no virtualization on a large list is the worst case: full DOM + teardown and rebuild on every sort or filter (verify against the currency brief for your version). + +- **Props and component granularity — inline allocations and reactive destructuring**: object or + array literals, and arrow-function handlers, written inline in a template (`<Child :config="{}"`, + `@click="() => ..."`) create a new reference on every parent render; child components receiving + them will see the prop as "changed" even when the logical value is identical. In Vue 3.5, props + can be destructured in `<script setup>` while preserving reactivity via the compiler transform — + confirm the project's Vue version supports this before relying on it. Over-deep component trees + multiply the patch work per update; prefer fewer, coarser components for very high-frequency + updates (e.g., real-time data feeds) where component boundary overhead accumulates + (verify against the currency brief for your version). + +- **Watcher leaks and unbounded reactive stores**: `watch`/`watchEffect` return a stop handle that + must be called when the owning component or composable is torn down — effects created outside a + component lifecycle (in a utility module, a global composable called once at app init, or a + Pinia action) are never auto-stopped and accumulate for the process lifetime. Global reactive + stores (`reactive` objects or Pinia stores) that grow unbounded — caches that append but never + evict, event-log arrays that keep every entry — create both a memory leak and a watcher fan-out + cost as more components subscribe. Also check for DOM event listeners attached in `onMounted` + without a matching removal in `onUnmounted` (cross-reference the `memory` lane in + `../javascript-typescript.md`). + +- **SSR and hydration cost (Nuxt)**: full hydration at page load walks the entire component tree + and re-creates the reactive graph client-side even for below-the-fold or interaction-free + sections. Vue 3 / Nuxt expose lazy-hydration strategies — `hydrateOnVisible` defers until the + element enters the viewport, `hydrateOnIdle` defers to `requestIdleCallback`, and + `hydrateOnInteraction` defers until a pointer or keyboard event — so above-the-fold and + interactive components hydrate first. `defineAsyncComponent` combined with lazy hydration splits + the component's JS out of the initial chunk and delays execution. Nuxt's component islands + (`<NuxtIsland>`) let entire subtrees remain server-rendered HTML with no client JS. Hydration + mismatches (server HTML differs from client render) force a full client-side re-render of the + affected subtree and log a warning — they are a correctness and performance issue simultaneously + (verify against the currency brief for your version). + +- **Bundle size — async components, auto-imports, and tree-shaking**: route-level code splitting + via dynamic `import()` in the router config keeps each route's component graph out of the initial + bundle; `defineAsyncComponent` does the same at the component level and can be combined with a + loading/error slot to keep UX clean during load. Nuxt's auto-import feature is convenient but can + silently pull in large composables or utility modules on every route if the import graph is not + audited; verify which modules end up in the critical-path chunk with a bundle analyzer. Vue's + compiler tree-shakes runtime helpers by default in a properly configured build, but hand-rolling + `Vue.createVNode` or importing from `vue` internals can defeat that (cross-reference the + `bundling-build` module and the `payload-startup` lane in `../javascript-typescript.md`; + verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/jvm.md b/.claude/skills/performance-audit/profile-packs/jvm.md new file mode 100644 index 00000000..74a9cae6 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/jvm.md @@ -0,0 +1,77 @@ +# Profile Pack: JVM (Java / Kotlin) + +Specializes the generic lanes for Java/Kotlin stacks (Spring, Hibernate/JPA, standard library). +Load alongside `generic-pack.md`; signals here augment, not replace, the generic signals. + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) +- `List.contains` / `List.remove` / `List.indexOf` inside a loop — O(n²); replace the list with a `HashSet` or `LinkedHashSet` for membership tests, or pre-build a lookup `Map` keyed on the relevant field. +- Repeated computation of a loop-invariant value (regex compile, format-string parse, expensive factory call) inside the loop body; hoist before the loop or use a static final. +- Nested stream pipelines that each traverse the same collection independently; flatten into a single pass or restructure with a `Map`/`Multimap` grouping. +- `LinkedList` used for random access or indexed iteration (O(n) per `get`); `ArrayList` used for frequent head removal or FIFO queuing — wrong structure for the access pattern. +- `TreeMap`/`TreeSet` chosen for unsorted data where only hashing is needed — log(n) overhead with no ordering benefit; prefer `HashMap`/`HashSet`. +- Comparing or sorting by a field computed inside the comparator (e.g., `Comparator.comparing(x -> expensiveDerive(x))`) without memoization — the derivation runs O(n log n) times; extract to a decorated sort. + +## Memory & allocation (lane `memory`) +- Autoboxing primitives in hot paths (`int` → `Integer`, `long` → `Long`, etc.); prefer primitive streams (`IntStream`, `LongStream`, `DoubleStream`) or primitive-specialised collections (verify against the currency brief for your version). +- `String` concatenation (`+`) inside a loop — the compiler does not always collapse these; use `StringBuilder` explicitly, or `String.join` / `StringJoiner` for delimiter-separated values. +- Stream pipelines in tight inner loops where lambda capture allocates a closure object per call and intermediate stages allocate wrapper spliterators; a plain `for` loop is zero-allocation. +- `collect(toList())` or `collect(toSet())` on a very large dataset that is then immediately reduced to a scalar — pipeline lazily to the terminal without materialising the intermediate collection. +- `ThreadLocal` caching expensive mutable objects (e.g., `SimpleDateFormat`, heavyweight parsers) — safe with platform thread pools, but each virtual thread is never pooled, so one object is allocated per task and never reused; use a shared immutable alternative (e.g., `DateTimeFormatter`) or an explicit pool (verify against the currency brief for your version). +- Large allocations that exceed the G1 region-size threshold become "humongous objects", bypass the young generation, and are collected only at mixed/full GC — look for very large byte arrays, large `ArrayList`/`HashMap` literals, or bulk-copy patterns in hot paths. +- Unbounded `static` caches or maps that grow without eviction, causing sustained heap pressure and increasingly frequent GC cycles. + +## Data access & I/O (lane `data-access`) +- Hibernate/JPA N+1: lazy associations accessed inside a loop trigger one `SELECT` per row; fix with `JOIN FETCH` in JPQL, `@EntityGraph` at the repository method, or `@BatchSize` on the collection mapping to batch proxy loads (verify against the currency brief for your version). +- Multiple `@OneToMany` associations loaded simultaneously with `FetchType.EAGER` — can produce a Cartesian-product result set whose row count is the product of collection sizes; use explicit `JOIN FETCH` for one association at a time or separate queries. +- Per-row inserts/updates inside a loop (`save` inside `for`); use `saveAll` / `executeBatch` and confirm batch mode is enabled in the datasource config — Hibernate silently skips batching if identity generators are used (verify against the currency brief for your version). +- `SELECT *` or fetching full entities when only a subset of columns is needed downstream; prefer interface-based projections or DTO query results to limit the transferred payload. +- Missing pagination: `findAll()` or unbounded `@Query` on a table with unbounded growth; always apply `Pageable` / `LIMIT`+`OFFSET` or cursor-based pagination. +- Chatty round-trips inside a loop — sequential calls to an external service or cache for each element; coalesce into a single batched call and look up from the returned map. +- Lazy-association access outside a transaction boundary — causes `LazyInitializationException` at runtime or forces an implicit session open, masking latency; ensure the service layer opens a transaction that covers all association traversals. + +## Concurrency & parallelization (lane `concurrency`) +- **Defend:** `synchronized` block enclosing more work than necessary (I/O, network, heavy computation); narrow the critical section to the minimum shared-state mutation, or replace with `ReentrantLock` / `ReadWriteLock` when reads vastly outnumber writes, or with `ConcurrentHashMap` / `AtomicReference` for lock-free access. +- `synchronized` block wrapping blocking I/O when virtual threads are in use — pinning keeps the carrier OS thread blocked and defeats the concurrency model; replace with `ReentrantLock` for long-lived critical sections (verify against the currency brief for your version). +- `ThreadPoolExecutor` core/max sizes not matched to workload type: CPU-bound pools should not exceed available cores; I/O-bound pools can safely exceed core count; a shared pool mixing both starves one kind. +- Blocking calls (`Thread.sleep`, synchronous JDBC, blocking HTTP client) on reactive or async dispatch threads (Netty event-loop, RxJava scheduler, Reactor `parallel`); offload to a bounded `Schedulers.boundedElastic()` or equivalent blocking-capable pool. +- **Exploit:** sequential `for` loops over large, truly independent items — consider `parallelStream()` or `CompletableFuture.allOf`; but verify independence (stateless lambdas, no shared mutable state, no ordering dependency, no `synchronized`/blocking inside the lambda) before suggesting parallel execution. +- `parallelStream()` on small collections, or with stateful intermediate operations (`distinct`, `sorted`, `limit`, `skip`) on ordered sources — parallel overhead exceeds benefit; add `.sequential()` or switch to a plain loop. +- `CopyOnWriteArrayList` used for write-heavy scenarios — every mutation copies the full array; prefer `ConcurrentLinkedQueue` or a lock-guarded structure for write-heavy cases. + +## Framework-idiom currency (lane `idiom-currency`) +- Consult the currency brief/index for Spring Boot, Hibernate/JPA, and Jackson. +- Flag any patterns the brief marks superseded or deprecated; flag fast-path APIs the brief lists that the code doesn't use; flag changed defaults the code still overrides unnecessarily. +- Offline (no brief): flag candidate idiom concerns at LOW confidence, marked for manual currency check. + +## Payload / startup / build (lane `payload-startup`) +- Spring component scan over a broad base package (e.g., the root application package) forces the container to inspect every classpath entry at boot; narrow `@ComponentScan` to the smallest meaningful sub-packages, or switch to explicit `@Bean` registration in `@Configuration` classes. +- Default eager singleton initialisation: expensive beans that are rarely exercised at runtime delay startup and inflate initial heap; apply `@Lazy` (or `spring.main.lazy-initialization=true` globally) where safe — but note that a lazy bean depended on by an eager singleton is still initialised at startup. +- Reflection-heavy frameworks (annotation processors, classpath scanners, dynamic proxy generators) block native-image compilation and increase startup cost on standard JVM; prefer AOT-friendly construction or explicit configuration (verify against the currency brief for your version). +- Unused dependencies on the classpath are scanned, loaded, and sometimes auto-configured; audit for dead weight that inflates startup time and heap footprint. +- `@PostConstruct` or `InitializingBean.afterPropertiesSet` performing I/O (schema validation, remote config fetch, warm-up queries) on the main thread blocks the entire application context refresh; move to a background `ApplicationRunner` or `CommandLineRunner` if not strictly required before first request. + +--- + +## Kotlin notes + +The runtime/GC/JIT and Spring/Hibernate signals above apply equally to Kotlin-on-JVM. These are the Kotlin-specific *language* idioms with distinct perf characteristics: + +- Higher-order functions allocate a `Function` object (and capture closure) per call; mark hot HOFs `inline` to eliminate that allocation (also enables `reified`) — but avoid `inline`-ing large bodies, which bloats bytecode at every call site. +- Boxing of nullable/boxed primitives: `Int?`/`Long?`/`Boolean?` and `Array<Int>` box to `java.lang.Integer` etc.; in hot paths and large collections use non-null primitives and primitive arrays (`IntArray`/`LongArray`/`DoubleArray`). +- Eager collection-operator chains (`list.map{ }.filter{ }…`) allocate a new intermediate `List` at each step; for large collections use `.asSequence()` for lazy single-pass evaluation (plain loops or eager ops win for small ones). +- `runBlocking` on a request/hot path, or blocking calls on a dispatcher not meant for blocking, starve the coroutine pool; use `withContext(Dispatchers.IO)` for blocking work and keep CPU work on `Dispatchers.Default`. +- `const val` (compile-time inlined at call sites) vs `val` (field read); `@JvmStatic` / `@JvmField` avoid synthetic accessor/getter overhead on Java-interop hot paths. +- Delegated properties (`by lazy`, `Delegates.observable`) add a per-property delegate object + indirection — fine generally, but watch in hot, frequently-instantiated types. + +--- + +## Sources + +Durable signals in this pack are grounded in these authoritative sources (version-specific facts and +their per-entry citations live in `../version-indexes/jvm.md`): + +- Hibernate ORM User Guide — fetching / N+1 (docs.hibernate.org) +- Oracle JDK docs — virtual threads, `java.util.concurrent`, Stream API, GC tuning guide +- Spring Framework / Spring Boot reference — lazy initialization, bean registration, AOT/native diff --git a/.claude/skills/performance-audit/profile-packs/python.md b/.claude/skills/performance-audit/profile-packs/python.md new file mode 100644 index 00000000..70cc3785 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/python.md @@ -0,0 +1,124 @@ +# Profile Pack: Python + +Loaded for Python codebases. Augments the generic pack with Python-specific performance signals +across CPython's runtime model, the standard library, and common frameworks. + +This is the **core** Python pack (always-loaded lanes + Runtime & interpreter notes). Deep, +tech-specific lenses (web frameworks, ORM/DB, the data stack, async I/O, serialization, task queues) +live in load-on-detection modules under `profile-packs/python/` — see **`## Framework / sub-stack +modules`** at the bottom. The core lanes are deliberately kept as always-useful quick-hits; a module +*deepens* its area when its signals appear in scope (it does not merely restate the core bullet). + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) +- `in` membership test against a `list` inside a loop is O(n) per test; replace with a `set` or `dict` key lookup (O(1) average). +- Repeated pure computation on the same arguments inside a loop or per-request path — hoist invariants out of the loop or memoize with `functools.cache` / `functools.lru_cache` (verify against the currency brief for your version). +- Materializing a full collection (`list(...)`, `[x for x in ...]`) when a single-pass generator expression or `itertools` pipeline (`chain`, `islice`, `takewhile`, `batched`) would avoid the allocation entirely. +- `pandas` `.apply(axis=1)`, `.iterrows()`, or explicit Python loops over DataFrame rows — these are Python-speed row dispatch; replace with NumPy/pandas vectorized operations, `pd.eval()` for large arithmetic expressions, or Numba/Cython for tight numerical loops (verify against the currency brief for your version). +- Slow `numpy` linear-algebra / FFT (`np.dot`, `np.matmul`, `np.linalg.*`, `np.fft`) — these are only fast when NumPy is linked against an optimized BLAS/LAPACK (OpenBLAS, MKL, Apple Accelerate); a build lacking it can be an order of magnitude slower. Confirm linkage with `numpy.show_config()` (verify against the currency brief for your version). +- Aggregation or filtering done in Python after a full fetch — push it to the database (`annotate()`, `F()` expressions, SQL aggregates) or to NumPy/pandas; the cost of data transfer plus Python iteration typically exceeds a database- or array-level operation. +- Recomputing a derived value on every call that could be a `@functools.cached_property` or a module-level constant. +- Building a string by `+=` in a loop: CPython sometimes optimizes this in-place when the left operand's refcount is 1, but that path is fragile (breaks under another reference, on PyPy, or when the result is built from a list) and degrades to O(n²) copying — prefer `"".join(parts)` over a list, or `io.StringIO` for incremental construction. + +## Memory & allocation (lane `memory`) +- Materializing a large sequence that is iterated only once — a generator expression or `itertools` pipeline defers allocation to one element at a time. +- Unnecessary defensive copies on hot paths: `list[:]`, `dict.copy()`, `DataFrame.copy()` — audit whether a view or reference is safe before copying. +- Reading entire files into memory (`file.read()`) when line-by-line iteration or chunked streaming bounds peak resident size. +- Unbounded in-memory accumulation (appending to a list/dict indefinitely without eviction, pagination, or streaming to a sink). +- `functools.lru_cache` / `functools.cache` with an unbounded or very large key space grows for the process lifetime (no TTL, no size cap unless `maxsize` is set) — and on an **instance method** it pins every `self` ever passed in memory for the life of the cache (a classic leak); prefer a bounded `maxsize`, a module-level cache keyed by value not object, or a `cached_property` for per-instance memoization (verify against the currency brief for your version). +- Many small, homogeneous objects without `__slots__`: each instance normally carries a per-instance `__dict__` (~280 + bytes in CPython); declaring `__slots__` eliminates that dictionary. Subclasses must also declare `__slots__` or the saving is lost. +- Retaining large intermediate DataFrames after a pipeline step that could be overwritten in place or narrowed in dtype (e.g., `object` column holding low-cardinality strings → `Categorical`; oversized `int64` → `int16/int32`). + +## Data access & I/O (lane `data-access`) +- ORM N+1: accessing a related attribute inside a loop without eager loading. Django — missing `select_related` (foreign key / one-to-one) or `prefetch_related` (reverse FK, M2M); SQLAlchemy — missing `selectinload` (preferred for collections) or `joinedload` (many-to-one scalar refs) (verify against the currency brief for your version). +- Per-row writes inside a loop — replace with `bulk_create` / `bulk_update` (Django), `session.add_all` + `execute(insert(...).values(...))` (SQLAlchemy), or `cursor.executemany` (verify against the currency brief for your version). +- Over-fetching: loading full ORM objects or `SELECT *` when only a few columns are needed — use `.values()` / `.values_list()` (Django), `query(Model.col)` / `select(col)` (SQLAlchemy), or `.only()` / `.defer()` to exclude large deferred fields. +- `QuerySet.iterator(chunk_size=N)` absent on queries that stream thousands of rows — without it the entire result set is cached in the QuerySet, holding peak memory until GC. +- Accessing `obj.foreign_key.id` instead of the already-loaded `obj.foreign_key_id` — triggers an unnecessary SQL round-trip. +- Synchronous DB drivers or blocking file I/O called directly inside an `async def` handler — this parks the entire event loop; use async-native drivers or offload via `asyncio.to_thread` (verify against the currency brief for your version). +- Calling `.exists()`, `.count()`, or `.contains()` separately after a queryset that will also be iterated — evaluate the queryset once and reuse the cached result. +- Persisting medium/large DataFrames as CSV or pickle on a hot or repeated path — prefer a columnar binary format (`to_parquet`/`read_parquet` via PyArrow, or Feather) for far smaller files, faster read/write, dtype preservation, and column/row pruning on read (verify against the currency brief for your version). + +## Concurrency & parallelization (lane `concurrency`) +- CPU-bound work dispatched to `threading.Thread` or `ThreadPoolExecutor` — the GIL serializes Python bytecode across threads; use `multiprocessing` or `ProcessPoolExecutor` for true parallelism on CPU-bound tasks. +- Independent `await` calls chained sequentially — replace with `asyncio.gather(*coros)` or an `asyncio.TaskGroup` (prefer `TaskGroup` for structured concurrency and automatic cancellation of siblings on failure) (verify against the currency brief for your version). +- Blocking calls (`time.sleep`, synchronous file I/O, sync DB drivers, CPU-bound computation) called directly inside `async def` — offload via `asyncio.to_thread(fn, *args)` or `loop.run_in_executor(None, fn)` to avoid parking the event loop. +- Fire-and-forget `asyncio.create_task(...)` with no reference stored — the event loop holds only a weak reference; the task can be silently garbage-collected mid-execution. Store tasks in a `set` and discard on completion via `add_done_callback`. +- `asyncio.gather(...)` without `return_exceptions=True` and no surrounding `try/except` — a single coroutine failure cancels siblings without giving them a chance to clean up; use `TaskGroup` or handle exceptions explicitly. +- Thread pool sized by default without profiling — `ThreadPoolExecutor` defaults may be too small for I/O-bound workloads or wastefully large for CPU-bound ones; size explicitly after measurement. + +## Framework-idiom currency (lane `idiom-currency`) +- Consult the currency brief for the detected framework (Django, Flask, FastAPI, SQLAlchemy, pandas, NumPy, Celery, etc.) — flag superseded patterns, newly available fast paths, and changed defaults the code still fights. +- Offline (no brief): note candidate idiom concerns at LOW confidence, flagged for manual currency check. + +## Payload / startup / build (lane `payload-startup`, conditional) +- Heavy initialization at module import time (opening DB connections, loading ML models, compiling large data structures) — defer to first use, application startup hooks, or explicit lazy-init patterns. +- `re.compile(pattern)` called inside a loop or per-request function — compile patterns once at module level; the internal cache (`re._cache`) is bounded and can evict entries under high pattern variety. +- Logging calls with pre-computed strings in hot paths: f-strings (`logger.debug(f"val={x}")`) or concatenation always evaluate the expression even when the level is disabled. Use `%`-style lazy args (`logger.debug("val=%s", x)`) or guard with `if logger.isEnabledFor(logging.DEBUG):`; in tight loops, cache the boolean before entering. +- Importing heavyweight packages unconditionally at module top level when only a narrow submodule or optional path needs them — use lazy imports (`import` inside the function/branch) or narrower alternatives to reduce startup latency and memory footprint. +- `pandas` `DataFrame.apply` / `Series.apply` with a pure-Python callable on a large dataset used at request time rather than precomputed or vectorized — startup-phase preprocessing is far cheaper than per-request Python-speed dispatch. + +--- + +## Runtime & interpreter notes (load for every Python project) + +CPython's execution model shapes every lane: a dynamic, bytecode-interpreted runtime where pure-Python +loops are slow and parallelism is constrained by the GIL. These durable realities are the Python analog +of a "variant notes" section — *how the interpreter behaves and how to measure it*, cutting across all +the lanes above and every module below. + +- **The GIL governs what concurrency buys you**: a single GIL serializes Python bytecode, so threads + give **concurrency for I/O-bound work but not parallelism for CPU-bound work** — the GIL is released + during blocking I/O and *inside* C extensions (NumPy, `hashlib`, compression), so threading *does* + speed up array/C-level work but not pure-Python compute. CPU-bound parallelism needs + `multiprocessing`/`ProcessPoolExecutor`, a C/Cython/Numba extension, or the experimental + free-threaded build (`python3.13t`, PEP 703) — confirm the interpreter and C-extension readiness + before assuming no-GIL (verify against the currency brief for your version). +- **Pure-Python tight loops are the cost model's sharpest edge**: attribute/global lookups, dynamic + dispatch, and per-iteration bytecode make a Python loop one to two orders of magnitude slower than + the equivalent in C — push hot loops into vectorized C (NumPy, built-ins, `str`/`bytes` methods), + inlined comprehensions, or a compiled extension (Cython/Numba). The 3.11+ specializing adaptive + interpreter narrows the gap on hot code but does not close it (verify against the currency brief for + your version). +- **Interpreter and version choice is a real lever**: major CPython releases ship broad speedups + (3.11 ≈ +25% over 3.10; comprehension inlining in 3.12), so the running version matters; for + long-running pure-Python workloads **PyPy**'s tracing JIT can be several times faster, while + sub-interpreters (PEP 684) and the experimental CPython JIT (PEP 744) are emerging options — match + the runtime to the workload rather than assuming stock CPython is the only target (verify against the + currency brief for your version). +- **The Python↔C boundary is fast in bulk, slow per-call**: crossing into a C extension is cheap once + but has per-call marshaling cost, so *many tiny crossings* (per-element NumPy scalar access, calling + a vectorizable op inside a Python loop) lose badly to *one bulk call* over the whole array — the fix + is almost always "do it in one vectorized call," not "call C more often." +- **Profile before optimizing — the tooling is good and cheap**: justify hot-path claims with + `cProfile`/`pstats`, a sampling profiler (`py-spy`, `Scalene` — which also attributes memory and + GPU), or Linux `perf` (3.12+ `-X perf`), not intuition; for short-lived processes (CLIs, serverless, + workers) import-time cost often dominates — measure it with `python -X importtime` before blaming + request handling (verify against the currency brief for your version). + +## Framework / sub-stack modules (load on detection) + +Load the core lanes + **Runtime & interpreter notes** above for *every* Python project. Additionally +load the matching module when its technology is detected in the audit scope, and include it as +ecosystem context in the relevant lane prompts. Each module *deepens* its area beyond the core +quick-hits — see the version index `../version-indexes/python.md` for version-specific facts. + +| Detected (signals) | Load module | +|---|---| +| **Web frameworks** — `django`, `flask`, `fastapi`/`starlette`, `gunicorn`/`uvicorn` (WSGI/ASGI) | [`python/web-frameworks.md`](python/web-frameworks.md) | +| **ORM & database** — `django` ORM, `sqlalchemy`, `psycopg`/`psycopg2`, `asyncpg` | [`python/orm-database.md`](python/orm-database.md) | +| **Data stack** — `numpy`, `pandas`, `polars`, `pyarrow` | [`python/data-stack.md`](python/data-stack.md) | +| **Async I/O** — `aiohttp`, `httpx`, `uvloop`, async DB drivers (`asyncpg`/`aiomysql`/`motor`), **or** `asyncio` used materially (an async service, not one stray `await`) | [`python/async-asyncio.md`](python/async-asyncio.md) | +| **Serialization & validation** — `orjson`/`ujson`/`msgspec`, `pydantic`, `marshmallow`, `pickle`, `msgpack`, **or** stdlib `json` on a hot/large path (not one incidental `json.loads`) | [`python/serialization.md`](python/serialization.md) | +| **Task & job queues** — `celery`, `rq`, `dramatiq`, `arq` | [`python/task-queues.md`](python/task-queues.md) | + +## Sources + +Durable signals in this pack are grounded in these authoritative sources (version-specific facts and +their per-entry citations live in `../version-indexes/python.md`): + +- Django — "Database access optimization" (docs.djangoproject.com/en/stable/topics/db/optimization/) +- SQLAlchemy 2.0 — relationship/loader guide (docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html) +- pandas — "Enhancing performance" (pandas.pydata.org/docs/user_guide/enhancingperf.html) +- CPython docs — asyncio, profiling HOWTO, `itertools`, data model (`__slots__`), logging HOWTO (docs.python.org) diff --git a/.claude/skills/performance-audit/profile-packs/python/async-asyncio.md b/.claude/skills/performance-audit/profile-packs/python/async-asyncio.md new file mode 100644 index 00000000..88801251 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/python/async-asyncio.md @@ -0,0 +1,97 @@ +# Python performance module: Async I/O (asyncio / aiohttp / httpx / uvloop) +> Load when `asyncio`, `aiohttp`, `httpx`, `uvloop`, or an async DB driver (`asyncpg`/`aiomysql`/`motor`) is detected — see the module map in `../python.md`. Core lanes + Runtime & interpreter notes live in `../python.md`; this file is the Async I/O lens only. + +## Async I/O (asyncio / aiohttp / httpx / uvloop) + +> Scope: the CPython event loop and the I/O-ecosystem that runs on it — aiohttp (client and +> server), httpx async client, uvloop, and async DB drivers (asyncpg, aiomysql, motor). The +> core pack covers asyncio primitives (gather vs TaskGroup, blocking-in-async→to_thread, +> fire-and-forget GC, gather return_exceptions, thread-pool sizing, GIL→multiprocessing for +> CPU-bound work); this module goes deeper into the mechanics that determine real async +> throughput: client/pool reuse, bounded fan-out, loop-blocking anywhere in the call stack, +> loop selection, per-task scheduling cost, timeout hygiene, streaming vs buffering, and +> tool-mismatch (async used where a process pool is the right answer). + +- **Client/session created per request instead of once per application**: constructing an + `aiohttp.ClientSession` or `httpx.AsyncClient` inside a coroutine or view handler means each + call allocates a new connection pool, pays TCP (and TLS) handshake cost on every request, and + leaks the underlying socket resources until the finalizer runs — there is no keep-alive and + no connection reuse. The correct pattern is one long-lived client shared across the + application lifetime (e.g., created at startup and closed at shutdown via a lifespan hook). + Once shared, tune pool limits to match actual concurrency: for aiohttp use + `TCPConnector(limit=<total>, limit_per_host=<per-origin>)`; for httpx use + `Limits(max_connections=<total>, max_keepalive_connections=<idle>)` (verify against the + currency brief for your version). + +- **Unbounded concurrent fan-out without back-pressure**: `asyncio.gather(*[coro(item) for item + in large_list])` or a `TaskGroup` that spawns one task per item with no upper bound opens one + connection (or socket or DB cursor) per item simultaneously — this can exhaust file + descriptors, overwhelm the remote server's accept queue, or hit connection pool limits and + raise. Neither `gather` nor `TaskGroup` limits concurrency by itself. Bound the fan-out with + an `asyncio.Semaphore` guarding each coroutine's I/O, a fixed worker-pool pattern + (`asyncio.Queue` + N consumer tasks), or `itertools.batched` to process in bounded chunks + (cross-reference the core **Concurrency** lane in `../python.md`). + +- **Hidden blocking that parks the loop — beyond the obvious**: the core pack flags + `time.sleep`/sync file I/O; this module covers the subtler sources. A single synchronous call + anywhere on the event-loop thread stalls *every* concurrently waiting coroutine for its + duration: `requests` or `urllib` instead of aiohttp/httpx; a sync DB driver (`psycopg2`, + `pymysql`, `pymongo`) instead of asyncpg/aiomysql/motor; `socket.getaddrinfo` (DNS, which is + synchronous by default in CPython — use `aiodns` or rely on aiohttp's built-in async + resolver); `json.loads` on a megabyte-scale payload; CPU-bound parsing or validation + (protobuf decode, regex on large strings); `logging` to a blocking file handler or a network + log sink with no async adapter. The symptom is event-loop latency that does not improve as + concurrency rises. Audit every import used inside `async def` code for sync-only + implementations; offload unavoidable blocking via `asyncio.to_thread` or + `loop.run_in_executor` (verify against the currency brief for your version). + +- **Default selector event loop on a high-RPS async service**: CPython's default event loop is + a selector-based pure-Python loop; on Linux `uvloop` (libuv-backed) replaces it and delivers + ~2–4× higher I/O throughput for connection-heavy workloads. Install and activate with + `asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())` before `asyncio.run()`, or pass + `--loop uvloop` to uvicorn. Not applicable on Windows (libuv has no IOCP backend there). + Running a high-RPS aiohttp or FastAPI/Starlette service without uvloop on Linux is leaving + measurable throughput on the table (verify against the currency brief for your version). + +- **Per-task scheduling overhead and eager-task bypass**: spawning an `asyncio.Task` for each + trivial item in a tight loop adds scheduler round-trips even when the coroutine completes + synchronously (e.g., a cache hit that returns immediately). In CPython 3.12+, + `asyncio.eager_task_factory` makes synchronously-completing coroutines skip the event-loop + round-trip entirely — set it via `loop.set_task_factory(asyncio.eager_task_factory)` or pass + a compatible `loop_factory` to `asyncio.run()`. Net negative if most tasks are genuinely + async and yield at least once. Also look for `await coro()` inside a loop over independent + items where the items could instead be batched with `gather`/`TaskGroup`: sequential `await` + serialises work that could overlap (cross-reference the core **Concurrency** lane in + `../python.md` and the `asyncio` section of `../version-indexes/python.md`). + +- **Missing or coarse timeouts and `CancelledError` mishandling**: coroutines that issue + outbound HTTP calls or DB queries without per-operation timeouts let a slow peer pin a + connection and a task indefinitely, eventually exhausting the pool. Use `asyncio.timeout(n)` + (3.11+, preferred) or aiohttp/httpx client-level `timeout=` parameters to bound each + operation; `asyncio.wait_for()` carries wrapping overhead and is superseded by + `asyncio.timeout()` for new code. When a task is cancelled, `CancelledError` must propagate + — catching it without re-raising (or catching `BaseException` and not re-raising) leaves + connections half-closed and can deadlock `TaskGroup` cancellation. Also audit asyncpg/aiomysql + for missing `command_timeout` or `timeout` arguments on query calls (verify against the + currency brief for your version; see `asyncio.timeout` entry in `../version-indexes/python.md`). + +- **Async generators and streaming responses buffered into memory**: code that does + `data = [item async for item in async_gen]` or `body = await resp.read()` on a large HTTP + response materialises the full payload before processing — this couples peak memory to + response size and delays first-byte processing. Prefer aiohttp's + `resp.content.iter_chunked(n)` or `resp.content.iter_any()` and httpx's + `async with client.stream(...) as resp: async for chunk in resp.aiter_bytes()` to process + incrementally. For async generators that produce faster than the consumer can process, add + back-pressure via a bounded `asyncio.Queue` between producer and consumer rather than + collecting into a list (cross-reference the core **Memory** lane in `../python.md`). + +- **Async used for CPU-bound work, or `asyncio.run` called repeatedly in a hot path**: async + concurrency gives interleaved I/O waits on one thread — it does not provide parallelism and + the GIL still serialises Python bytecode. Dispatching CPU-bound work (image processing, + cryptography, data transformation, parsing) to `asyncio.gather` or a `TaskGroup` keeps + everything on one core and may be slower than synchronous code due to scheduling overhead; + a `ProcessPoolExecutor` (or `multiprocessing`) is the correct tool. Separately, calling + `asyncio.run(coro)` inside a loop or per-request path creates and tears down a fresh event + loop on every invocation — this is expensive; use `loop.run_until_complete` on a persistent + loop or restructure so a single `asyncio.run` drives the entire program + (cross-reference the core **Concurrency** lane and Runtime & interpreter notes in `../python.md`). diff --git a/.claude/skills/performance-audit/profile-packs/python/data-stack.md b/.claude/skills/performance-audit/profile-packs/python/data-stack.md new file mode 100644 index 00000000..f482344e --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/python/data-stack.md @@ -0,0 +1,24 @@ +# Python performance module: Data stack (NumPy / pandas / Polars / PyArrow) +> Load when `numpy`, `pandas`, `polars`, or `pyarrow` is detected — see the module map in `../python.md`. Core lanes + Runtime & interpreter notes live in `../python.md`; this file is the Data stack lens only. + +## Data stack (NumPy / pandas / Polars / PyArrow) + +> Scope: NumPy array operations, pandas DataFrames, Polars lazy/eager API, and PyArrow columnar tables — the stack that dominates scientific, analytics, and ML data pipelines. The recurring theme is: **vectorize over iterate** (Python-level loops over rows or elements are the single largest avoidable cost), **dtypes drive cost** (a single `object`-dtype column can dominate an entire pipeline), **avoid copies and temporaries** (chained indexing, compound expressions, and format round-trips each silently allocate), and **stay columnar** (zero-copy interchange across Arrow-backed libraries beats repeated serialization at every boundary). Core `.iterrows`/`.apply` and DataFrame-to-parquet basics live in `../python.md`; this file goes deeper on each surface. + +- **Python loops building DataFrames row-by-row**: a `pd.concat`/`df.append` call inside a loop reallocates the entire frame on every iteration — O(n²) total data movement; growing a list of dicts or records then calling `pd.DataFrame(list_of_dicts)` once is O(n). Similarly, `.apply(axis=1)` with a Python callable dispatches one Python call per row; replace with `np.where`/`np.select` for conditional logic, vectorized arithmetic across columns, or `.map`/`.replace` with a dict for label translation — all operate at C speed. Cross-reference the **Algorithmic complexity** lane in `../python.md` for the broader iteration footgun. + +- **Chained indexing and copy-vs-view ambiguity**: `df[mask]['col'] = x` performs two separate `__getitem__` calls — the first may return a copy or a view depending on internal state, making the write a silent no-op and triggering `SettingWithCopyWarning`; it also double-allocates. Use `.loc[mask, 'col'] = x` as a single-pass write. Under pandas 2.0+ Copy-on-Write (CoW), chained writes reliably raise rather than silently doing nothing — and many defensive `.copy()` calls become unnecessary because CoW defers copying until a mutation occurs; audit `.copy()` callsites after enabling CoW (verify against the currency brief for your version). + +- **`object`-dtype columns defeating vectorization**: any column holding Python strings, mixed types, or `Decimal` objects is stored as an array of Python pointers — every operation on it calls back into Python for each element, defeating NumPy's C loops and bloating memory. Remedy: `category` dtype for low-cardinality strings (< ~10 % unique ratio), PyArrow-backed string dtype (`dtype_backend="pyarrow"` on I/O, or `pd.ArrowDtype(pa.large_string())`) for string-heavy columns, and numeric downcasting (`int64`→`int32/int16`, `float64`→`float32`) when the value range allows. A single overlooked `object` column can dominate the cost of an otherwise vectorized pipeline — check `df.dtypes` and `df.memory_usage(deep=True)` together (verify against the currency brief for your version). + +- **NumPy temporaries and in-place operations**: a compound expression like `a * b + c * d` allocates two full-size intermediate arrays (`a*b`, then `c*d`) before the addition; for large arrays this doubles or triples peak memory and adds GC pressure. Use in-place ops (`a *= b; a += c * d`), output arguments (`np.multiply(a, b, out=tmp)`), `np.einsum` for contraction chains, or `numexpr.evaluate(...)` for multi-operator expressions over large arrays. Separately, broadcasting that expands a small array to match a large one materializes the full broadcast shape — check whether the operation truly needs the expanded array or can stay a scalar/1-D operation. Cross-reference the **Memory & allocation** lane in `../python.md`. + +- **Memory layout and cache behavior (C vs Fortran order)**: NumPy defaults to C-contiguous (row-major) storage; `.T` returns a Fortran-contiguous view without copying, but subsequent C-order operations on that view stride non-contiguously through memory, degrading cache hit rate and disabling BLAS fast paths that require contiguous input. Call `np.ascontiguousarray(arr)` before passing a transposed or sliced array into `np.linalg.*` / `np.dot` / `np.matmul`. Wrong axis in `np.concatenate`, `np.stack`, or `np.sum` can force non-contiguous access across a large dimension — profile with `arr.flags` and `arr.strides` before assuming a BLAS call is fully optimized. + +- **Reading data: schema inference, column over-read, and memory mapping**: `pd.read_csv(...)` without `dtype=` infers column types by scanning, allocates `object` for any ambiguous column, and reads all columns into memory — pass `dtype=`, `usecols=`, and `parse_dates=` explicitly. For files re-read repeatedly, Parquet or Feather are categorically better (core `../python.md` names them); beyond that, use Parquet column pruning and row-group predicate pushdown (`filters=` in `pd.read_parquet` / PyArrow's `read_table`) to avoid loading data that the query discards. For large read-once NumPy arrays, `np.memmap(..., mode='r')` and `zarr` (for chunked/compressed) avoid loading the full array into RAM (verify against the currency brief for your version). + +- **Polars lazy API as an alternative execution model**: when a pandas pipeline is the bottleneck and the data exceeds a few hundred MB, Polars' lazy API (`pl.scan_parquet`/`pl.scan_csv` + `.collect()`) applies query optimization, automatic predicate/projection pushdown, and multi-threaded execution that pandas eager mode does not; this can be a step-change rather than a constant-factor improvement. DuckDB-over-Arrow (`duckdb.execute("SELECT … FROM parquet_scan(…)")`) offers similar pushdown with SQL syntax and near-zero copy overhead when the result stays Arrow-backed. These are architectural alternatives — flag when the profiled bottleneck is in the pandas pipeline itself, not in a single operation (verify against the currency brief for your version). + +- **Arrow zero-copy interchange and format round-trips**: PyArrow tables, pandas DataFrames with `dtype_backend="pyarrow"`, and Polars DataFrames all share the same Arrow memory layout — converting between them is zero-copy or near-zero-copy. Converting from any of these to plain NumPy or Python objects (`.to_numpy()`, `.tolist()`, `.to_dict()`) copies and unboxes the data. Repeatedly converting between pandas and NumPy representations across pipeline stages, or serializing to/from Python dicts to pass between steps, copies the data each time — keep data in one columnar representation through the pipeline and convert only at the output boundary. Cross-reference the **Data access & I/O** lane in `../python.md`. + +- **BLAS linkage and thread oversubscription**: NumPy linear algebra speed depends entirely on the BLAS library linked at build time (`numpy.show_config()` or `np.__config__.blas_opt_info` reveals it) — a fallback reference BLAS is orders of magnitude slower than OpenBLAS/MKL/Accelerate. Oversubscription is a separate risk: BLAS spawns its own thread pool, and running a `ProcessPoolExecutor` or `ThreadPoolExecutor` alongside it multiplies total threads; on CPU-bound NumPy workloads set `OMP_NUM_THREADS` / `OPENBLAS_NUM_THREADS` / `MKL_NUM_THREADS` explicitly to `1` for worker processes and let the coordinator hold the full pool, or the reverse. Note that NumPy C-level ops release the GIL, so `ThreadPoolExecutor` CAN yield real parallelism for array work (unlike pure Python) — cross-reference Runtime & interpreter notes in `../python.md` on GIL semantics (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/python/orm-database.md b/.claude/skills/performance-audit/profile-packs/python/orm-database.md new file mode 100644 index 00000000..9f78ce81 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/python/orm-database.md @@ -0,0 +1,33 @@ +# Python performance module: ORM & database (Django ORM / SQLAlchemy / psycopg / asyncpg) +> Load when `django` ORM, `sqlalchemy`, `psycopg`/`psycopg2`, or `asyncpg` is detected — see the module map in `../python.md`. Core lanes + Runtime & interpreter notes live in `../python.md`; this file is the ORM & database lens only. + +## ORM & database (Django ORM / SQLAlchemy / psycopg / asyncpg) + +> Scope: patterns that touch `django.db`, `sqlalchemy` (Core or ORM, 1.4/2.0), `psycopg2`/`psycopg3`, `asyncpg`, and PgBouncer in front of any of these. The recurring themes are: **pool reuse** (a mis-sized or zero-lifetime pool pays TCP + auth overhead on every request), **SQL compilation caching** (dynamic query builders that produce unbounded distinct statement shapes silently disable it), **transaction/session scope** (long-open transactions hold a connection and acquire locks far beyond their useful window), **streaming large results** (buffering `.all()` materialises the full result set in RAM), and **reading the generated SQL** (the ORM abstracts the query but the database executes it — consult `.query`, `echo=True`, or `EXPLAIN` before concluding). Cross-reference the core **Data access & I/O** lane for the generic N+1 / eager-loading / bulk-write basics, the **Memory & allocation** lane for result-set materialisation pressure, and the `async-asyncio` and `web-frameworks` sibling modules for event-loop blocking. + +- **Connection pool sizing left at defaults under load**: SQLAlchemy `QueuePool` defaults (`pool_size=5`, `max_overflow=10`, `pool_timeout=30`) are conservative for typical WSGI/ASGI workloads — a burst beyond 15 concurrent DB-needing threads blocks or times out. Django `CONN_MAX_AGE=0` (the default) tears down and re-establishes the connection on every request, paying TCP + TLS + auth overhead at request rate; setting `CONN_MAX_AGE` to a positive value (or `None` for persistent) with `CONN_HEALTH_CHECKS=True` amortises that cost but shifts risk to stale-connection errors if not paired with a health check. Tune all four SQLAlchemy pool parameters explicitly (`pool_size`, `max_overflow`, `pool_timeout`, `pool_recycle`) for any production workload; `pool_recycle` is especially important behind a NAT, load balancer, or PgBouncer that silently drops idle sockets (verify against the currency brief for your version). + +- **SQL compilation cache pollution from dynamic query builders**: SQLAlchemy caches compiled SQL keyed by statement *structure* (clause shape, bound-parameter positions), not by values; each structurally distinct `select()` / `insert()` / `update()` is a separate cache entry. A builder that conditionally appends `.where(…)` clauses, generates `IN (?, ?, …)` with inline values rather than a bound array, or varies column lists dynamically can produce an unbounded number of distinct shapes, filling and evicting the cache (default `query_cache_size=500`) and paying Python-side compilation on every execution. Look for `[cached since N s]` absent in `echo=True` logs; for `text()` with string-interpolated values (defeats both caching and parameterisation — use `.bindparams(…)` or `bindparam` instead); and for `IN` with a Python list expanded inline rather than `any_()` / a bound array. Confirm cache effectiveness before concluding compilation cost is negligible (verify against the currency brief for your version). + +- **Long-open transactions, plus post-commit and autoflush surprises**: a `Session` or + `atomic()`/`ATOMIC_REQUESTS` block holds one pooled connection (and any row/table locks) for its + whole duration — HTTP calls, broker publishes, retry loops, or heavy compute placed between begin + and commit extend lock contention and block other writers; Django `ATOMIC_REQUESTS=True` wraps the + *entire* request (template rendering and middleware included) in one transaction. Two SQLAlchemy + defaults inject hidden queries: `expire_on_commit=True` re-SELECTs every attribute on first access + after `commit()` (load before commit, or set `expire_on_commit=False`), and `autoflush=True` fires + an implicit `flush()` on every query while inserts/deletes are pending — an add-then-query loop + becomes an O(n) flush-then-query series (spot the interleaved `INSERT`/`SELECT` in `echo=True` logs) + (verify against the currency brief for your version). + +- **Streaming large result sets not used — full materialisation**: SQLAlchemy `.all()` (or `scalars().all()`) fetches and holds every row in memory before the first item is accessible; on result sets of tens of thousands of rows this spikes RSS proportional to row width × count (cross-reference the core **Memory & allocation** lane). Replace with `yield_per(n)` on the `Result` / `ScalarResult` object, or `conn.execution_options(stream_results=True)` for Core queries, to activate server-side cursors where the driver supports them (`asyncpg`, `psycopg3`, psycopg2 server-side cursors). Django `.iterator(chunk_size=N)` is the analogue; without it the entire queryset materialises into `QuerySet._result_cache`. `yield_per` / `iterator` interact with eager loading (`selectinload`, `prefetch_related`) — the prefetch may be skipped or must be restructured; verify the trade-off (verify against the currency brief for your version). + +- **Lazy attribute and relationship loading footguns beyond simple N+1**: SQLAlchemy `lazy="select"` (the 1.x default) fires a SELECT *at attribute access time*; if the session is closed before the attribute is accessed the ORM raises `DetachedInstanceError` — a correctness failure masking the missing eager load. Accessing an attribute after `session.commit()` without `expire_on_commit=False` fires a re-SELECT even for attributes loaded in the original query. `subqueryload` embeds a correlated subquery and can produce a large intermediate result when the parent set is large; `selectinload` issues a separate `SELECT … WHERE id IN (…)` and scales better for large collections. For huge append-only collections `lazy="write_only"` (SQLAlchemy 2.0) prevents accidental full-collection loads. Django `.only()`/`.defer()` restricts columns at fetch time but accessing a deferred field on a fetched instance fires an additional per-instance SELECT — audit call sites for post-access to deferred fields inside loops (verify against the currency brief for your version). + +- **Bulk write batching depth — parameter limits and round-trip overhead**: unbatched bulk operations — `session.add_all()` with one `INSERT` per object, or Django `bulk_create([…])` without a `batch_size` — hit database parameter-count limits (PostgreSQL: ~65 535) or produce one statement large enough to strain the parser. SQLAlchemy 2.0 `insertmanyvalues` transparently rewrites `session.execute(insert(Model), [dicts])` into batched `INSERT … VALUES (…),(…) RETURNING …` controlled by `insertmanyvalues_page_size` (default 1000); the legacy `bulk_insert_mappings` / `bulk_update_mappings` bypasses ORM unit-of-work overhead but is superseded by the 2.0 `execute(insert(), [dicts])` path. For upserts, `INSERT … ON CONFLICT DO UPDATE` — `QuerySet.bulk_create(update_conflicts=True)` (Django 4.1+) or `insert().on_conflict_do_update()` (SQLAlchemy) — eliminates the read-then-write round-trip. Prefer `psycopg3` `copy()` or `asyncpg.copy_records_to_table()` for very high row counts where even batched INSERT is too slow (verify against the currency brief for your version). + +- **PgBouncer transaction-pooling mode breaking server-side state**: PgBouncer in transaction-pooling mode returns the server connection to the pool after each transaction, so any server-side state set within a session — prepared statements, `SET LOCAL` parameters, advisory locks, `pg_temp` tables, `LISTEN` channels — is silently invalidated or visible to the next user of that connection. SQLAlchemy `pool_pre_ping` issues `SELECT 1` but does not re-issue `SET` commands or re-prepare statements; psycopg2 prepared-statement caching and psycopg3 `prepare_threshold` auto-prepare will prepare statements the pooler never sees, causing `prepared statement "…" does not exist` errors or silent fallback to unprepared execution. Disable driver-level auto-prepare when behind transaction-pooling PgBouncer, or switch to session-pooling mode for workloads that require server-side state; read the generated connection strings and pooler mode before diagnosing mysterious statement errors (verify against the currency brief for your version). + +- **Async ORM correctness-as-performance — blocking the event loop via sync drivers**: using `psycopg2` (sync) under an async framework (FastAPI, Starlette, Django ASGI) blocks the entire event loop thread for the duration of every DB call, serialising all concurrency on that worker — the symptom is good single-request latency but poor concurrency throughput. Replace with `asyncpg` or `psycopg3` (async mode) under SQLAlchemy `AsyncSession` / `AsyncEngine`, or with Django's native async ORM (`aget`, `afilter`, `abulk_create`) under ASGI (Django 4.1+). Do not use `sync_to_async()` as a permanent fix when the async-native path exists; it still dispatches to a thread pool and loses the latency benefit of the async driver. Mixing a `Session` and `AsyncSession` over the same connection, or calling sync `session.execute()` from within a coroutine, raises runtime errors or silently blocks; keep session types consistent within an async context (cross-reference the core **Concurrency** lane and the `async-asyncio` sibling module) (verify against the currency brief for your version). + +- **Query shape and index coverage hidden by the ORM**: the ORM emits SQL the developer may never read. Filtering or sorting on unindexed columns, `OFFSET N` deep-pagination (scans and discards N rows — replace with keyset / seek pagination anchored on the last seen value), `COUNT(*)` on large tables for every paginated response, `DISTINCT` or multi-column `ORDER BY` forcing a sort node, and implicit `JOIN` on polymorphic models can each dominate query latency while appearing as a single ORM method call. The diagnostic path is: read the generated SQL (`str(qs.query)` on a Django QuerySet; `str(stmt.compile(…))` or `echo=True` on SQLAlchemy; `connection.queries` in Django `DEBUG` mode); run `EXPLAIN (ANALYZE, BUFFERS)` on the emitted query; confirm index scans vs sequential scans. This is a judgment trigger, not a checklist — push the agent to read the actual SQL before inferring cost (cross-reference the core **Data access & I/O** lane). diff --git a/.claude/skills/performance-audit/profile-packs/python/serialization.md b/.claude/skills/performance-audit/profile-packs/python/serialization.md new file mode 100644 index 00000000..907de997 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/python/serialization.md @@ -0,0 +1,80 @@ +# Python performance module: Serialization & validation (json / orjson / pydantic / msgpack / pickle) +> Load when stdlib `json`, `orjson`/`ujson`/`msgspec`, `pydantic`, `marshmallow`, `pickle`, or `msgpack` is detected — see the module map in `../python.md`. Core lanes + Runtime & interpreter notes live in `../python.md`; this file is the Serialization & validation lens only. + +## Serialization & validation (json / orjson / pydantic / msgpack / pickle) + +> Scope: stdlib `json` (pure-Python decoder, CPython C-accelerated encoder), drop-in faster encoders +> (`orjson`, `ujson`, `msgspec`), `pydantic` (v1 pure-Python vs v2 Rust-core), `marshmallow`, +> `dataclasses`/`attrs`, `pickle`, and `msgpack`. The recurring themes are: validation and encoding +> cost multiplied over every API request; pydantic v2's compiled Rust core (`pydantic-core`) as a +> step-change in throughput; avoiding redundant validation passes on already-trusted data; and choosing +> a wire format matched to the actual interop boundary rather than defaulting to JSON everywhere. + +- **Pydantic v1 vs v2 on a hot validation path**: pydantic v2 moved all validation and serialization + into a compiled Rust core (`pydantic-core`), making it roughly an order of magnitude faster than + pure-Python v1 for the same model. A codebase still on v1 — or using v1-era patterns such as + `.dict()` instead of v2's `.model_dump()`, or mixing `orm_mode = True` config instead of + `model_config = ConfigDict(from_attributes=True)` — is leaving very large gains on the table on any + request-scoped validation path. v2 is a deliberate migration with some behavior changes, so frame + findings as an upgrade to evaluate, not a drop-in swap (verify against the currency brief for your + version). + +- **Redundant or repeated validation of the same data**: validating the same payload more than once + — e.g., a pydantic model in the framework layer (FastAPI request body) plus a second + `MyModel(**data)` call in business logic, or re-parsing JSON that was already deserialized — pays + the validation cost twice. For data that is already trusted (read back from your own DB, produced + internally), use `Model.model_construct(**data)` to skip validation entirely, or `TypeAdapter` to + validate a bare list or dict once rather than per-element in a loop (verify against the currency + brief for your version; cross-reference the **Web frameworks** module in `web-frameworks.md` for + FastAPI `response_model` re-validation). + +- **stdlib `json` on large or frequent payloads**: `json.loads`/`json.dumps` is backed by a C + extension for encoding but remains relatively slow on large payloads compared to Rust-backed + alternatives; `json.loads` is a pure-C parser but `orjson` and `msgspec` still outpace it + materially at scale. `orjson` (Rust) serializes `dataclasses`, `datetime`, `UUID`, and `numpy` + arrays natively without a `default=` callback; `msgspec` offers similar speed with built-in schema + validation. Key API differences: `orjson.dumps` returns `bytes` (not `str`), is stricter about + non-serializable types, and does not support all stdlib `json` kwargs. Switch the hot path + carefully — do not assume the API is a drop-in (verify against the currency brief for your version). + +- **pickle on a hot path or across a trust boundary**: `pickle` is slow for large object graphs + (it reflects on every attribute via `__reduce__`/`__getstate__`), is Python-version-coupled (a + pickle from one CPython version may break on another), and is **a remote-code-execution vector + on untrusted input** — any cache, message queue, or RPC channel that deserializes pickle from an + external or user-controlled source is a critical security issue. Prefer a schema'd binary format + (`msgpack`, `msgspec`, protobuf) for inter-service or cache payloads, or `orjson`/`json` for + human-readable wire formats. Annotate hotspots where the pickle protocol version is left at + default — higher protocol numbers are faster (verify against the currency brief for your version). + +- **Schema or model object construction at request time**: building a pydantic `TypeAdapter`, a + `marshmallow` schema instance, or a dynamic pydantic model class inside a request handler or in a + tight loop pays the reflection/compilation cost on every invocation. `marshmallow` schemas carry + significant construction overhead (field introspection, validator wiring); pydantic `TypeAdapter` + compiles a Rust validation core the first time it is constructed. Both should be instantiated once + at module scope or in a startup lifespan hook and reused. Dynamic model creation via + `pydantic.create_model(...)` in a request path is a strong signal of this anti-pattern (verify + against the currency brief for your version). + +- **`marshmallow` on large collections**: `marshmallow` is pure-Python and reflection-heavy; on + result sets of hundreds of objects, `Schema.dump(many=True)` iterates the list at Python speed, + calling each field's serialization method via attribute lookup per row. For these hot list + endpoints consider pydantic v2 (Rust-serialized), `msgspec.Struct`, or `orjson` with typed + objects instead. `SerializerMethodField`-equivalent (`marshmallow.fields.Method`) callables that + trigger additional lookups per row compound this cost (cross-reference the **Web frameworks** + module in `web-frameworks.md` for DRF `ModelSerializer` on list endpoints). + +- **Custom `datetime`, `Decimal`, and `UUID` encoding in stdlib `json`**: `json.dumps(obj, + default=my_handler)` calls `my_handler` for every non-serializable value, once per instance in + the payload — on a response containing hundreds of `datetime` or `Decimal` values this is a + per-value Python function call overhead. `datetime.isoformat()` and `str(Decimal(...))` are + also non-trivial when called at scale. `orjson` and `msgspec` have native fast paths for + `datetime`, `UUID`, and (for orjson) `numpy` scalars/arrays, eliminating the `default=` dispatch + entirely (verify against the currency brief for your version). + +- **Wire format mismatched to the interop boundary**: JSON is the right default for human-readable, + cross-language APIs, but service-to-service payloads and cache values where size and throughput + matter should use a binary format. `msgpack` is compact, schema-less, and crosses language + boundaries without a compiler step; `msgspec` combines fast binary encoding with Python schema + validation; protobuf/gRPC adds a schema contract with generated code. Over-large JSON payloads + that transmit fields the consumer never reads should be paginated or projected before serialization + rather than serialized whole (cross-reference the **Data access & I/O** lane in `../python.md`). diff --git a/.claude/skills/performance-audit/profile-packs/python/task-queues.md b/.claude/skills/performance-audit/profile-packs/python/task-queues.md new file mode 100644 index 00000000..e2556b2d --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/python/task-queues.md @@ -0,0 +1,24 @@ +# Python performance module: Task & job queues (Celery / RQ / Dramatiq / arq) +> Load when `celery`, `rq`, `dramatiq`, or `arq` is detected — see the module map in `../python.md`. Core lanes + Runtime & interpreter notes live in `../python.md`; this file is the Task & job queues lens only. + +## Task & job queues (Celery / RQ / Dramatiq / arq) + +> Scope: patterns in codebases using **Celery** (the dominant one, backed by Redis or RabbitMQ), **RQ**, **Dramatiq**, and **arq** (async, Redis-native). Includes broker interaction, result backends, and worker process setup. The recurring themes are: **right-size the unit of work** (task granularity sets the floor on overhead), **pass references not payloads** (broker is not a data bus), **prefetch and concurrency matched to workload** (defaults are wrong for uneven task durations), **reuse the broker connection** (a new producer per `.delay()` call pays TCP + auth on every enqueue), and **worker-side DB and object reuse** (the same footguns from the core data-access and concurrency lanes apply at high fan-out in workers). Cross-reference the core **Data access & I/O** lane, the core **Concurrency & parallelization** lane, and the `orm-database` sibling module throughout. + +- **Task granularity mismatch — too fine or too coarse**: enqueueing thousands of sub-millisecond jobs each pay broker round-trip + serialization + worker dispatch overhead that can exceed the work itself — batch or chunk small jobs (`celery chunks`, `celery group` over a chunked iterable, or pass a list and loop inside one task); look for `.delay()` / `.apply_async()` called in a tight Python loop over an iterable. The opposite failure is a single monolithic task that cannot be retried at the failed step, parallelized across workers, or cancelled mid-run — right-size the unit of work so retries and fan-out are both meaningful and safe (verify against the currency brief for your version). + +- **Large objects serialized as task arguments or results**: passing a full `DataFrame`, ORM queryset, file contents, or any object whose serialized size is measured in kilobytes to `.delay()` / `.apply_async()` pushes that payload through the broker on every enqueue and out again on dispatch — the broker is not a data bus. Pass a primary key, S3 key, cache key, or other cheap reference and re-fetch inside the task. Result backend payload size is the symmetric problem: storing a large return value from every task (especially in a `chord` join) amplifies broker and backend I/O proportionally to fan-out. Choose the serializer deliberately — Celery defaults to JSON; **pickle is faster for complex Python objects but is a remote-code-execution risk across any trust boundary** (untrusted producers or workers); `msgpack` is a practical middle ground for controlled environments (cross-reference the core **Memory & allocation** lane and the `serialization` sibling module if present) (verify against the currency brief for your version). + +- **`worker_prefetch_multiplier` default hoarding messages with uneven task durations**: Celery's default `worker_prefetch_multiplier=4` causes each worker child to reserve up to 4 × concurrency messages from the broker before processing them, starving other workers when tasks are long or duration-variable — a slow task holds reserved messages hostage while faster workers sit idle. Set `worker_prefetch_multiplier=1` for any workload where task duration is uneven or long; set it to 0 (unlimited) only for homogeneous, sub-second tasks where throughput matters more than fair distribution. This is one of the highest-leverage Celery tuning knobs and is nearly always wrong at its default for production workloads (verify against the currency brief for your version). + +- **Worker concurrency model mismatched to workload**: Celery `--concurrency` with the default `prefork` model forks the full application once per worker slot — each slot carries a copy of imported modules, ORM connection pools, and loaded config, so wide prefork pools waste memory proportional to application size. For I/O-bound tasks (HTTP calls, DB queries, light processing), `--pool=gevent` or `--pool=eventlet` multiplexes many concurrent tasks on far fewer OS threads with much lower per-slot memory; for CPU-bound tasks (image processing, ML inference, heavy computation), prefork with explicit `--concurrency` equal to physical core count is correct and gevent will not help. RQ uses threads by default; arq is natively async and should run tasks that are async-native or offloaded via `asyncio.to_thread` — a sync-blocking call inside an arq coroutine parks the entire event loop (cross-reference the core **Concurrency & parallelization** lane) (verify against the currency brief for your version). + +- **Result backend overhead when results are never consumed**: every Celery task stores its return value in the result backend by default — a broker DB write (and read at expiry) per task even when no caller ever calls `.get()`. Set `task_ignore_result = True` globally and opt in per task with `@task(ignore_result=False)` only where the result is actually read; alternatively set `ignore_result=True` on the `.apply_async()` call site. `chord` and `group` result joins poll the backend in a loop until all sub-tasks complete — the polling interval (`result_chord_join_timeout`, `result_backend_max_sleep_between_retries_ms`) and backend latency directly add to chord completion time; using the broker itself (e.g., Redis) as both broker and result backend avoids a second round-trip destination but couples backend availability to broker availability. Set `result_expires` to bound backend storage growth (verify against the currency brief for your version). + +- **Acknowledgement timing and visibility timeout mismatches**: Celery `task_acks_late=False` (default) acknowledges the message on receipt rather than on task completion — a worker crash mid-execution loses the task silently. `task_acks_late=True` moves the ack to after task completion, enabling at-least-once delivery, but requires tasks to be **idempotent** (the enabler of cheap at-least-once). Separately, a Redis broker visibility timeout (`visibility_timeout` in `broker_transport_options`) shorter than the longest task duration causes the broker to redeliver the still-running task to another worker, producing duplicate execution without a crash — set it to at least 1.5× the 99th-percentile task duration. A too-short timeout combined with `acks_late` is a common source of mysterious duplicate processing in production (verify against the currency brief for your version). + +- **Broker connection opened per `.delay()` call or per worker request**: creating a new Celery app instance, a new Redis client, or a new AMQP connection inside a task body or per-request helper instead of reusing the app-level connection pool pays TCP + auth overhead on every call. Celery manages a `broker_pool_limit` (default: 10) connection pool for publishing — a pool limit of 0 disables pooling and opens a connection per publish. Look for `Celery(...)` constructed inside a task function, a bare `redis.Redis(...)` created per `.delay()` wrapper, or `apply_async` calls placed on a synchronous request-serving path where the broker round-trip adds to user-visible latency (cross-reference the core **Data access & I/O** lane) (verify against the currency brief for your version). + +- **Worker-side DB connection churn and per-task object re-initialization**: each prefork worker slot is a separate process with its own connection pool — connections are not shared across slots, and a slot that exits and restarts (due to `--max-tasks-per-child`) tears down and re-establishes its pool. Re-creating ORM sessions, loading configuration files, deserializing ML models, or compiling regex patterns inside the task body (rather than once at worker boot via Celery's `worker_process_init` signal or a module-level singleton) repeats that cost on every task invocation. The same N+1, over-fetch, and missing `select_related` / `selectinload` footguns from the core data-access lane apply inside workers — often worse because workers run at high fan-out, amplifying per-task DB overhead by concurrency (cross-reference the `orm-database` sibling module and the core **Data access & I/O** lane) (verify against the currency brief for your version). + +- **Scheduling thundering herds and unthrottled fan-out**: Celery Beat tasks with identical crontab schedules enqueue a burst of tasks at the same instant — if the schedule fires many tasks concurrently (e.g., every task on the hour) and workers are sized for steady-state load, the burst overwhelms workers and the broker queue depth spikes. Add random jitter to schedules or stagger crontabs. `group` / `chord` fan-out of very high cardinality (thousands of sub-tasks) can similarly overwhelm the broker ingest rate and the result backend join; add a `rate_limit` on the task (`@task(rate_limit="100/m")`) when tasks target an external API or a shared resource with a service-level rate cap. RQ and Dramatiq equivalents (`dramatiq.rate_limits`, RQ's `job_timeout` and queue priority) should be verified per library (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/python/web-frameworks.md b/.claude/skills/performance-audit/profile-packs/python/web-frameworks.md new file mode 100644 index 00000000..7c26395d --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/python/web-frameworks.md @@ -0,0 +1,93 @@ +# Python performance module: Web frameworks (Django / Flask / FastAPI / gunicorn / uvicorn) +> Load when Django (`django`), Flask (`flask`), FastAPI/Starlette (`fastapi`/`starlette`), or a +> WSGI/ASGI server (`gunicorn`/`uvicorn`) is detected — see the module map in `../python.md`. +> Core lanes + Runtime & interpreter notes live in `../python.md`; this file is the Web frameworks +> lens only. + +## Web frameworks (Django / Flask / FastAPI / gunicorn / uvicorn) + +> Scope: the request path through Django (including DRF), Flask, and FastAPI/Starlette, and the +> WSGI/ASGI servers that host them — gunicorn (sync and UvicornWorker), uvicorn standalone. The +> recurring themes are worker/event-loop model mismatch (sync work in async contexts, async work +> without the right worker class), per-request construction of objects that should be built once at +> startup, and serializer/validation cost that compounds on list endpoints. The core pack covers +> ORM N+1 strategy, asyncio primitives, and import-time startup cost; this module covers the +> framework mechanics that sit between the request arriving at the server and the response leaving. + +- **WSGI/ASGI worker model & sizing mismatch**: gunicorn `sync` workers (the default) each serve + one request at a time, so a single blocking call (DB, outbound HTTP, filesystem) stalls that + worker — throughput scales only by adding workers (heuristic ≈2·CPU+1), not by writing `async` + code. ASGI apps (FastAPI, Starlette, Django async views) need `uvicorn.workers.UvicornWorker` or + uvicorn directly; a sync gunicorn worker in front of an ASGI app falls back to a compatibility + shim and loses all async concurrency. Async workers need fewer processes (each runs an event + loop), but CPU-bound work blocks the whole loop for its duration (verify against the currency + brief for your version). + +- **Blocking call inside an `async def` handler (event-loop parking)**: a `def` (sync) endpoint + in FastAPI/Starlette runs in a threadpool — bounded by the threadpool size — so a slow sync + endpoint can exhaust the pool and queue requests, but it does not park the event loop. An + `async def` endpoint that calls any synchronous blocking operation (sync DB driver, `requests` + library, blocking file I/O, `time.sleep`) parks the event loop for every concurrent request + on that worker. Django `async def` views calling the sync ORM without wrapping in + `sync_to_async` are the canonical Django instance of this. Offload via + `asyncio.to_thread` / `sync_to_async`, or replace with an async-native driver + (cross-reference the **Concurrency** lane in `../python.md` and the `async-asyncio` module). + +- **Per-request construction of expensive objects**: building a `requests.Session`, + `httpx.Client`, DB engine, or other connection-bearing object inside a view/handler instead + of once at startup or application lifespan means no connection pool is shared across + requests, TCP and TLS handshake costs are paid per request, and teardown races can leak file + descriptors. FastAPI `Depends()` dependencies that instantiate such clients without caching + re-run on every request unless declared as a singleton or bound to a lifespan resource. + Similarly, compiling a regex, loading a config file, or deserializing a static resource + inside the view pays that cost on every call (cross-reference the `payload-startup` lane in + `../python.md`). + +- **Middleware runs on every request — health checks, 404s, and OPTIONS included**: an auth + middleware issuing a DB query per request, a session store deserializing unconditionally, or + per-request log serialization on a hot route adds latency that no cache amortizes and that + per-endpoint profiling hides. Scope heavy middleware to the sub-router / route-prefix that needs + it, or short-circuit before the expensive step (e.g. skip session loading on stateless + endpoints); Django's `MIDDLEWARE` list is ordered and additive — each entry is a Python + call-chain plus any I/O it performs (cross-reference the `orm-database` module for per-request DB + cost). + +- **DRF `ModelSerializer` cost and N+1 hidden in serialization**: Django REST Framework's + `ModelSerializer` uses reflection to build field maps at class-definition time and iterates + result rows through Python-speed attribute access, making it noticeably slow on lists of + hundreds of rows or more. `SerializerMethodField` implementations that issue a DB query per + row are N+1 hidden inside serialization, invisible to queryset-level eager loading. Nested + serializers multiply this cost. On hot list endpoints, consider `.values()` / + `.values_list()` with manual dict-assembly, a non-reflective serializer (e.g., + `orjson`-backed), or `select_related` / `prefetch_related` wired to exactly match the fields + the serializer accesses (cross-reference the `orm-database` and `serialization` modules). + +- **FastAPI `response_model` re-validation on every response**: declaring `response_model=` on + a FastAPI endpoint causes every response to be validated and serialized through pydantic — + field filtering, type coercion, alias mapping — before bytes are sent. On large list payloads + or high-frequency endpoints this is measurable, especially with pydantic v1 (pure-Python) + where serialization is not Rust-accelerated. If the returned object is already a validated + pydantic model or a plain dict with a known shape, returning a pre-serialized `ORJSONResponse` + (via `fastapi.responses`) or setting `response_model=None` and handling serialization + explicitly skips the redundant pass (verify against the currency brief for your version; + pydantic v2 performance profile differs — cross-reference the `serialization` module). + +- **Template rendering over lazy querysets and large context dicts**: Django/Jinja2 template + rendering is synchronous and Python-speed; a `{% for %}` loop over a queryset that was not + evaluated before the template renders triggers the lazy SQL at render time, making the cost + hard to attribute to the database in profiling. Passing large unevaluated QuerySets into + context (especially with chained `.filter()` calls that have not yet hit the DB) or rendering + deeply nested template inheritance chains on high-volume pages multiplies per-request Python + work. For API endpoints returning JSON, replacing the default Django `JSONResponse` or DRF + renderer with an `ORJSONResponse` / `UJSONResponse` renderer can materially reduce encoding + time on large payloads (verify against the currency brief for your version). + +- **Serving static files or large responses through the application process**: routing static + files through Django's `staticfiles` in production, or streaming large binary responses + (file exports, reports, media) through gunicorn/uvicorn without `StreamingHttpResponse` + (Django) or `StreamingResponse` (FastAPI/Starlette), ties up a worker for the full duration + of the transfer. A worker held open to stream 50 MB to a slow client is unavailable for any + other request for that entire time. Static assets should be served by the reverse proxy + (nginx) or a CDN with appropriate cache headers; large dynamic responses should use streaming + responses with chunked transfer encoding so the worker is freed as soon as the last chunk is + handed to the OS socket buffer (cross-reference the `payload-startup` lane in `../python.md`). diff --git a/.claude/skills/performance-audit/profile-packs/rust.md b/.claude/skills/performance-audit/profile-packs/rust.md new file mode 100644 index 00000000..0766d206 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/rust.md @@ -0,0 +1,207 @@ +# Profile Pack: Rust + +Specializes the generic performance lanes for Rust codebases. Load alongside `generic-pack.md`; the +signals below narrow each lane to Rust-specific idioms and common footguns. + +This is the **core** Rust pack (always-loaded lanes + Runtime & build notes). Deep, tech-specific +lenses (async/tokio, web frameworks, serde, databases, data parallelism) live in load-on-detection +modules under `profile-packs/rust/` — see **`## Framework / sub-stack modules`** at the bottom. The +core lanes are always-loaded quick-hits; a module *deepens* its area when its signals are material to +the scope. + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) +- `Vec::contains` or `.iter().any()` inside a loop is O(n²); replace with `HashSet`/`BTreeSet` + (verify against the currency brief for your version). +- `HashMap`/`HashSet` with default SipHash-1-3 on hot integer-keyed maps; faster non-cryptographic + hashers (`rustc-hash`'s `FxHashMap`, `ahash`'s `AHashMap`) can give large wins for non-DoS paths + — benchmark before switching; `ahash` can outperform `fxhash` on AES-capable CPUs while + `fxhash` often wins on general integer keys (verify against the currency brief for your version). +- `Vec::remove` inside a loop is O(n) per call (shifts elements); prefer `Vec::swap_remove` when + order doesn't matter, or `Vec::retain` / `HashMap::retain` for batch removal. +- Sorting or de-duplicating on every iteration rather than once at construction time. +- Repeated computation inside a loop that could be hoisted: re-parsing strings, re-compiling + regexes, re-constructing maps or sets that are invariant over iterations. +- Large enum where all variants are sized by the biggest one; box the rare fat variant + (`Box<LargeVariant>`) to reduce the footprint of every enum instance; use + `RUSTFLAGS=-Zprint-type-sizes cargo +nightly build` to reveal the dominant variant's cost. +- Collecting an iterator into a `Vec` only to immediately iterate or pass it — chain lazy + adapters instead; prefer returning `impl Iterator<Item=T>` from functions over `Vec<T>`; + use `extend` to grow an existing collection from an iterator rather than collecting then + appending. +- `Option::ok_or(expensive_fn())` eagerly evaluates the error argument even on `Some`; use + `ok_or_else(|| expensive_fn())` — the same pattern applies to `unwrap_or`, `map_or`, + `Result::or`, and `Result::map_or`. + +## Memory & allocation (lane `memory`) +- Needless `.clone()`/`.to_owned()`/`.to_vec()` where a borrow (`&T`, `&str`, `&[T]`) would + suffice; likewise `.to_string()` on a hot path when a `&str` is usable. When you must clone + over an existing allocation, prefer `a.clone_from(&b)` — it reuses the existing buffer rather + than allocating fresh. +- `format!` on a hot path allocates a `String` on every call; write into a pre-allocated buffer + (`write!` into a `String`/`Vec<u8>`), use `std::format_args` to defer formatting, or replace + with a string literal where the value is static (verify against the currency brief for your version). +- `Vec`/`String`/`HashMap` grown by repeated push without `with_capacity`; pre-size when the + final length is known or estimable to avoid repeated doubling reallocations. Reciprocally, + call `Vec::into_boxed_slice()` on a fully-built, stable `Vec` to drop the spare-capacity word + and free excess memory. +- Loop-body allocations that could be "workhorse" buffers: declare the collection outside the + loop, `clear()` inside — preserves capacity and eliminates per-iteration allocation. +- `Cow<'_, str>` (or `Cow<'_, [T]>`) where a value is almost always borrowed but occasionally + needs mutation; avoids the unconditional `to_owned()`. `Cow::to_mut` will clone only on the + first mutation. +- `Rc`/`Arc` wrapping small `Copy` types: the initial allocation and indirection are unnecessary + for types cheaper to copy outright; conversely, `clone` on `Rc`/`Arc` only bumps the refcount + and does not allocate, so using it to share large read-mostly data is appropriate. +- Types wider than 128 bytes are copied with `memcpy` rather than inline code; check hot + oft-moved types with `std::mem::size_of` — shrink via field boxing, smaller integer widths + (`u32`/`u16` indices instead of `usize`), or replacing a `Vec<T>` field with `Box<[T]>` + (saves one `usize`). For vectors frequently empty inside hot structs, `ThinVec<T>` from + `thin_vec` shrinks the struct by one word (verify against the currency brief for your version). +- `smallvec::SmallVec<[T; N]>` eliminates heap allocation for short vectors that fit in `N` + elements inline; `arrayvec::ArrayVec<T, N>` is faster when the maximum size is statically + known (no heap-fallback path) — benchmark before adopting; larger `N` or large `T` makes + the inline struct heavier and copy-slower (verify against the currency brief for your version). + +## Data access & I/O (lane `data-access`) +- Unbuffered file/socket I/O: `std::fs::File`, `std::net::TcpStream` are unbuffered by default; + wrap in `BufReader`/`BufWriter` for many small reads/writes to cut syscall count. For + high-volume stdout output, combine manual locking (`let lock = stdout.lock()`) with + `BufWriter` — locking alone doesn't buffer. +- `println!`/`print!` acquire a mutex on every call; in output-heavy loops lock stdout once + (`let lock = stdout.lock()`) and use `writeln!(lock, …)`. +- Blocking I/O (`std::fs`, `std::net`, synchronous HTTP clients) called from inside an async + executor thread; move to async drivers or wrap with `spawn_blocking` + (verify against the currency brief for your version). +- Serde repeated serialization of unchanged data on a hot path; cache the serialized bytes or + the parsed form. Prefer borrowed `Deserialize<'de>` (zero-copy) forms to avoid allocation + when deserializing byte slices or string data (verify against the currency brief for your version). +- Over-fetching: deserializing full structs when only a subset of fields is read; use + `#[serde(skip)]`, partial structs, or a dedicated projection type. +- Per-item database/HTTP calls inside a loop (N+1); batch into a single query/request. +- `String` I/O incurs UTF-8 validation overhead; for ASCII or opaque-byte workloads use + `BufRead::read_until` or byte-string crates (`bstr`) to avoid that cost. +- Missing connection pooling for database or HTTP clients; reconstructing clients per-request + pays handshake and allocation cost every time. + +## Concurrency & parallelization (lane `concurrency`) +- `Arc<Mutex<T>>` (or `Arc<RwLock<T>>`) guard held across an `.await` point; the lock stalls + the executor thread for the full suspension period — drop or scope the guard before any + `.await`. +- Oversized critical sections: computation, allocation, or I/O done while a mutex is held that + could be moved outside the lock; minimize the code between lock acquisition and release. +- Independent futures `await`-ed serially (`let a = f1().await; let b = f2().await;`) when they + can run concurrently with `tokio::join!`/`futures::join!` or a buffered `FuturesUnordered` + stream (verify independence: no shared mutable state, no causal ordering dependency). +- Unbounded task spawning in a loop (`spawn` per item) with no back-pressure; replace with a + bounded concurrency pattern — semaphore, a buffered `FuturesUnordered` stream with a fixed + buffer size, or a task pool (verify against the currency brief for your version). +- CPU-bound work on the async executor thread starving I/O tasks; offload to `rayon` thread + pool or `spawn_blocking` — rayon is idiomatic for data-parallel workloads but requires + data independence; confirm no shared mutable state before parallelizing + (verify against the currency brief for your version). +- False sharing: hot fields accessed from multiple threads landing on the same cache line; + pad to cache-line alignment (`#[repr(align(64))]`) or separate into distinct structs. +- `std::sync::Mutex`/`RwLock` vs. `parking_lot` equivalents: the standard library versions + have improved significantly on modern platforms; measure under your contention profile before + switching — don't assume `parking_lot` wins (verify against the currency brief for your version). + +## Framework-idiom currency (lane `idiom-currency`) +- Consult the currency brief for the exact versions of `tokio`, `axum`/`actix-web`, `serde`, + `rayon`, `hyper`, and any ORM/query crate in use (verify against the currency brief for your + version). +- Flag patterns the brief/index marks superseded or deprecated; flag fast-path APIs they list + that the code doesn't use; flag changed defaults the code still fights. +- Offline (no brief): note candidate idiom concerns at LOW confidence, flagged for manual + currency check. + +## Payload / startup / build (lane `payload-startup`) +- Unneeded crate features via `default-features = true` inflating binary size and compile time; + audit with `cargo tree --edges features`. (Build-profile and allocator tuning live in + **Runtime & build notes** below.) +- Heavy `lazy_static!` / `OnceLock` / `once_cell::sync::Lazy` initializers — especially ones + that open sockets, parse large configs, or spawn threads — running synchronously on first + hot-path access; move to an explicit, early `init()` step. +- Work done at runtime that could be `const`-evaluated or pre-computed in `build.rs` (parsing or + code-generation that is invariant across executions). + +--- + +## Runtime & build notes (load for every Rust project) + +Rust has no GC and "zero-cost abstractions", but those guarantees hold only under the right build, and +the compilation model has its own performance and size consequences. These durable realities are the +Rust analog of a "variant notes" section — *how the code is built, allocated, and measured* — and cut +across all the lanes above and every module below. + +- **Always benchmark and profile the `--release` build**: `cargo build` (debug, `opt-level = 0`) runs + the same code 10–100× slower, with overflow checks and debug assertions on and no inlining — a perf + conclusion from a debug build is meaningless. Zero-cost abstractions (iterators, closures, `async`, + generics) are zero-cost *in release*, not in debug. For a faster dev inner loop, `[profile.dev] + opt-level = 1` keeps builds quick without full release cost (verify against the currency brief for + your version). +- **Build-profile levers trade compile time / portability for runtime speed**: `lto = "thin"`/`"fat"` + + `codegen-units = 1` (cross-crate inlining / whole-program opt; thin *local* LTO is on by default but + weaker), `opt-level = 3` (or `"s"`/`"z"` to optimize for size), `panic = "abort"` (drops unwinding + tables and landing-pad code), `-C target-cpu=native` when the build host equals the run host (unlocks + SIMD), and PGO via `cargo-pgo` for long-lived binaries. All need benchmarking; `target-cpu=native` + and PGO don't apply to portably-distributed binaries (verify against the currency brief for your + version). +- **Monomorphization is zero-cost at runtime, real cost at build and binary size**: a generic function + is compiled once per concrete type — fast and inlinable with no vtable, but duplicated code inflates + compile time and binary size. `dyn Trait` trades one vtable indirection per call for a single shared + copy (smaller binary, slightly slower call). A heavily-generic API instantiated over many types is a + bloat source; `cargo bloat` and `RUSTFLAGS=-Zprint-type-sizes` (nightly) reveal where (verify against + the currency brief for your version). +- **The global allocator is a one-line lever**: on allocation-heavy or multi-threaded workloads, the + default system allocator vs `tikv-jemallocator` or `mimalloc` as a drop-in `#[global_allocator]` can + cut tail latency and peak memory measurably — measure under your workload before adopting (verify + against the currency brief for your version). +- **No GC, but cost is explicit — and bounds checks are real**: there are no GC pauses, but allocations + and `.clone()`s are visible costs you can see and remove, and indexed access (`a[i]`) emits a bounds + check that iterators elide. Reach for the profiler, not intuition: `criterion` for + statistically-sound microbenchmarks (not ad-hoc wall-clock loops), `perf` + `cargo-flamegraph` / + `samply` for CPU, `cargo-bloat` / `twiggy` for binary size, `dhat` / heaptrack for allocations — all + on a release build with realistic data (verify against the currency brief for your version). + +## Framework / sub-stack modules (load on detection) + +Load the core lanes + **Runtime & build notes** above for *every* Rust project. Additionally load the +matching module when its technology is material to the audit scope, and include it as ecosystem context +in the relevant lane prompts. See the version index `../version-indexes/rust.md` for version-specific +facts. + +| Detected (signals) | Load module | +|---|---| +| **Async & tokio** — `tokio`, `#[tokio::main]`, `async fn`/`.await`, `futures`, `async-trait` | [`rust/async-tokio.md`](rust/async-tokio.md) | +| **Web frameworks** — `axum`, `actix-web`, `warp`, `hyper`, `tower`/`tower-http` | [`rust/web.md`](rust/web.md) | +| **Serialization** — `serde`, `serde_json`, `bincode`, `postcard`, `rmp-serde`, `prost`, `simd-json` | [`rust/serde-serialization.md`](rust/serde-serialization.md) | +| **Database access** — `sqlx`, `diesel`, `sea-orm`, `tokio-postgres`, `deadpool`, `redis` | [`rust/database.md`](rust/database.md) | +| **Data parallelism & compute** — `rayon` (`par_iter`), `polars`, `ndarray`, `std::simd`/portable-simd, `wide` | [`rust/data-parallelism.md`](rust/data-parallelism.md) | + +--- + +## Sources + +Durable signals in this pack are grounded in these authoritative sources (version-specific facts and +their per-entry citations live in `../version-indexes/rust.md`): + +- The Rust Performance Book — Nicholas Nethercote (nnethercote.github.io/perf-book): heap-allocations, + type-sizes, iterators, hashing, io, standard-library-types, build-configuration +- **Runtime & build** — Cargo book (profiles, LTO, `codegen-units`, `panic`), rustc codegen options + (`target-cpu`), `cargo-pgo`, `criterion`/`cargo-flamegraph`/`cargo-bloat` docs, jemalloc/mimalloc. + +**Sub-stack modules** carry their own grounding; key sources per module: + +- **Async & tokio** (`rust/async-tokio.md`) — tokio docs (runtime, `spawn_blocking`/`block_in_place`, + `sync::mpsc`, `select!`), `futures` (`FuturesUnordered`/`buffer_unordered`), async-book. +- **Web frameworks** (`rust/web.md`) — axum/actix-web/hyper/tower + tower-http docs (extractors, + layers, state, body limits/timeouts). +- **Serialization** (`rust/serde-serialization.md`) — serde docs (`borrow`, `flatten`, tagging), + serde_json (`from_reader`/`RawValue`/`arbitrary_precision`), bincode/postcard/rmp-serde/prost, + simd-json. +- **Database access** (`rust/database.md`) — sqlx (`Pool`, `query!`/offline, `fetch` streaming), + diesel / diesel-async, sea-orm, deadpool, redis-rs (pipelining). +- **Data parallelism & compute** (`rust/data-parallelism.md`) — rayon docs (`par_iter`, `with_min_len`, + `join`/`reduce`), polars (lazy/`scan_*`), ndarray (+ BLAS), `std::simd`/portable-simd. diff --git a/.claude/skills/performance-audit/profile-packs/rust/async-tokio.md b/.claude/skills/performance-audit/profile-packs/rust/async-tokio.md new file mode 100644 index 00000000..33a9cdf1 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/rust/async-tokio.md @@ -0,0 +1,83 @@ +# Rust performance module: Async & tokio +> Load when async Rust on tokio is detected — `tokio`, `#[tokio::main]`, `async fn`/`.await`, `futures`, `async-trait` — see the module map in `../rust.md`. Core lanes + Runtime & build notes live in `../rust.md`; this file is the Async & tokio lens only. + +## Async & tokio + +> Scope: the tokio runtime and the broader `futures` ecosystem — task scheduling, channel selection, +> combinators, and `async fn` in traits. The recurring theme is: don't block the executor (any +> synchronous work that stalls a worker thread stalls every task multiplexed on it), bound +> concurrency and channels so a fast producer can't blow out memory, treat cancellation as a +> first-class control-flow event rather than an afterthought, and keep futures small and `Send` so +> they remain schedulable on the multi-thread runtime without boxing. The **Concurrency** lane in +> `../rust.md` covers the high-frequency async footguns (`Arc<Mutex<T>>` across `.await`, serial +> awaits, unbounded spawn, CPU-bound on executor); this module goes deeper into runtime mechanics. + +- **`spawn_blocking` vs `block_in_place` for synchronous work**: synchronous or CPU-bound code + called from a worker thread stalls every other task multiplexed on that thread; `tokio::task:: + spawn_blocking` moves the work to a separate blocking-thread pool so the worker stays free, while + `block_in_place` (multi-thread runtime only) lets a worker execute blocking code in-place by + first migrating its other tasks away — prefer `block_in_place` when the blocking call must share + stack/locals with the async context and a full `spawn_blocking` roundtrip is awkward; note the + blocking pool is bounded and flooding it with long-running work has its own queuing cost (verify + against the currency brief for your version). + +- **Runtime flavor and `worker_threads` sizing**: `#[tokio::main]` defaults to a multi-thread + runtime with `worker_threads = num_cpus`, which is optimal for I/O-heavy services but + over-subscribes a CPU-bound service where fewer workers + a rayon pool is a better split; + `current_thread` (single-threaded runtime) removes work-stealing overhead and is appropriate for + `!Send`-heavy or embedded/test contexts but serialises all tasks; misconfigured sizing either + starves I/O (too few) or creates scheduler contention with OS thread thrashing (too many) — + confirm the flavor and thread count match the workload character (verify against the currency + brief for your version). + +- **Unbounded channels as implicit queues without back-pressure**: `tokio::sync::mpsc:: + unbounded_channel` (and the `futures` unbounded equivalents) let a fast sender grow the queue + without limit — memory grows unboundedly and tail latency spikes before the OOM; a bounded + `mpsc::channel(n)` applies back-pressure that propagates to the sender; also check channel + semantics against the fan-out pattern: `mpsc` for single-consumer pipelines, `broadcast` for + multi-consumer fan-out where receivers can lag, `watch` for "last-value-wins" state sharing + (verify against the currency brief for your version). + +- **Task granularity — spawn overhead and cooperative scheduling starvation**: spawning a task + per tiny unit of work (e.g., per message in a tight loop) pays scheduling overhead, per-task + heap allocation for the future, and wakeup costs that dominate at high rates — batch work into + coarser tasks; conversely, a long-running task that computes without ever reaching an `.await` + point monopolises its worker thread because tokio uses cooperative scheduling (the task-budget + yield is triggered by tokio I/O/timer primitives, not raw CPU loops) — insert + `tokio::task::yield_now().await` at loop checkpoints or offload the CPU work (verify against + the currency brief for your version). + +- **`select!` cancellation drops in-flight futures**: when a `tokio::select!` branch loses the + race its future is **dropped** immediately — any work in progress in that branch is silently + discarded; futures that are not cancellation-safe (partial reads from a `BufReader`, half-sent + writes, state machines midway through a multi-step transaction) corrupt their own state or lose + data when dropped this way; restructure with cancellation tokens, move state out of the future + before the select, or use only cancellation-safe primitives in select branches (verify against + the currency brief for your version). + +- **`join_all` vs `FuturesUnordered` / `buffer_unordered` for bounded in-flight concurrency**: + `futures::future::join_all` (and `tokio::join!`) runs all futures concurrently with no cap on + in-flight count — appropriate when N is small and bounded by construction, but creates a + concurrency spike for large or dynamic N; `stream::iter(...).buffer_unordered(k)` caps + in-flight work at `k`, applying back-pressure to the stream; `FuturesUnordered` gives finer + control but only makes progress when polled — if the enclosing task yields or is not selected, + pending futures stall, which manifests as a "stalled stream" where all futures appear queued + but none complete (verify against the currency brief for your version). + +- **`#[async_trait]` boxing on hot dispatch paths**: the `async-trait` macro rewrites every + `async fn` in a trait to return `Pin<Box<dyn Future + Send>>`, incurring a heap allocation and + dynamic dispatch on every call; on a hot path (per-request, per-message) this compounds; native + `async fn in traits` (stabilised in a later Rust edition) and `-> impl Future` return-position + opaque types avoid the allocation where the concrete type is statically known — cross-reference + the **Framework-idiom currency** lane and the currency brief for the minimum compiler version + where native async traits are available (verify against the currency brief for your version). + +- **Large or `!Send` futures: footprint and runtime compatibility**: every local variable live + across an `.await` point is captured in the future's state machine, so large buffers, big + temporary structs, or recursive layouts inflate the per-task allocation; box large + intermediate values (`Box::pin(...)` the sub-future, or heap-allocate the big local) to keep + the state-machine frame small; `!Send` types (`Rc`, a `std::sync::MutexGuard`, raw pointers) + held across `.await` make the enclosing future `!Send`, which prevents `tokio::spawn` on the + multi-thread runtime — scope non-Send values to before the await point or restructure so they + don't straddle a suspension (cross-reference the **Concurrency** lane in `../rust.md` and the + `data-parallelism` sibling module for rayon interaction patterns). diff --git a/.claude/skills/performance-audit/profile-packs/rust/data-parallelism.md b/.claude/skills/performance-audit/profile-packs/rust/data-parallelism.md new file mode 100644 index 00000000..7c6f21b6 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/rust/data-parallelism.md @@ -0,0 +1,100 @@ +# Rust performance module: Data parallelism & compute (rayon / polars / SIMD) +> Load when CPU-bound data-parallel or numeric compute is detected — `rayon` (`par_iter`), `polars`, `ndarray`, `std::simd`/portable-simd, `wide` — see the module map in `../rust.md`. Core lanes + Runtime & build notes live in `../rust.md`; this file is the Data parallelism & compute lens only. + +## Data parallelism & compute (rayon / polars / SIMD) + +> Scope: rayon data-parallel iterators, the polars columnar DataFrame engine, ndarray for +> n-dimensional numeric arrays, and explicit SIMD via `std::simd`/portable-simd and `wide`. +> The recurring theme is: parallelism only pays when total work significantly exceeds scheduling +> overhead; CPU thread pools and async I/O runtimes must stay separate to avoid core +> oversubscription; accumulate per-thread then reduce rather than sharing a contended sink; use +> lazy/columnar APIs for DataFrames rather than row-wise iteration; and rely on explicit SIMD or +> iterator forms when auto-vectorization cannot be confirmed. +> Cross-reference the **Concurrency** lane and Runtime & build notes in `../rust.md`, and the +> `async-tokio` sibling module for the CPU-pool / async-runtime boundary. + +- **`par_iter` on too-small work or too-cheap per-item bodies**: rayon divides work via a + work-stealing split protocol and schedules tasks across its thread pool — this has real + overhead per split. When the collection is small or the per-element computation is a few + arithmetic operations, `par_iter()` is measurably slower than a serial iterator; the + crossover is workload-dependent and must be measured. Use `rayon::slice::ParallelSlice:: + par_chunks` or configure `with_min_len` on the parallel iterator to coarsen granularity + so each rayon task processes enough elements to amortise the split cost (verify against the + currency brief for your version). + +- **rayon thread pool running inside a tokio worker — core oversubscription**: rayon's global + pool defaults to `num_cpus` threads; a multi-thread tokio runtime also defaults to `num_cpus` + workers. Calling into rayon from inside a tokio task doubles the active threads competing for + the same cores, causing context-switch thrash and cache pressure. Keep CPU-bound rayon work + entirely outside tokio workers — invoke it via `tokio::task::spawn_blocking` so the tokio + executor remains free, and size the rayon pool and the tokio worker pool together to sum to a + reasonable core budget (cross-reference the `async-tokio` sibling module and the Concurrency + lane in `../rust.md`) (verify against the currency brief for your version). + +- **Shared accumulation instead of per-thread reduce**: parallel writes to a shared sink — + a `Mutex<Vec<T>>`, a `std::sync::atomic` counter in the inner loop, or adjacent slots of + the same array — serialize threads or thrash cache lines. The idiomatic rayon pattern is + `par_iter().map(…).reduce(||identity, |a, b| combine(a, b))` or `.fold(||initial, |acc, + x| update(acc, x)).reduce(…)`, which accumulates privately per rayon task and merges at + the end; this avoids both lock contention and false sharing (the core **Concurrency** lane + in `../rust.md` names false sharing at a high level — the data-parallel instance is per-task + private accumulation) (verify against the currency brief for your version). + +- **`par_iter().collect()` ordering cost and `HashMap` contention**: collecting a parallel + iterator into an ordered `Vec` requires rayon to buffer and stitch results in original order, + which adds synchronization; when order is not needed, `par_iter().for_each(…)` or + `.reduce(…)` avoids the bookkeeping. Collecting directly into a `HashMap` from parallel + code contends on the map's internal lock; prefer accumulating per-task maps with + `fold`+`reduce`, or use a concurrent map like `dashmap::DashMap` only after confirming the + alternative is materially more complex (verify against the currency brief for your version). + +- **polars eager API materialising intermediate DataFrames**: the eager `DataFrame` API + executes and materialises each operation immediately; a pipeline of filter → select → + groupby → aggregation produces several full intermediate allocations. The **lazy** API — + `LazyFrame`, `scan_parquet`/`scan_csv`/`scan_ipc` + `.collect()` — defers execution and + applies predicate pushdown, projection pruning, and parallel partition execution in a single + pass. Row-wise iteration (`apply` with a closure over rows, Python-style `map` over + individual values) discards the columnar engine entirely and performs individual allocations + per row; reformulate as columnar expressions. Switch any large or chained pipeline to the + lazy API before tuning anything else (verify against the currency brief for your version). + +- **ndarray non-contiguous views and unintended copies**: ndarray operations on + non-contiguous views (sliced with non-unit strides, transposed layouts, or views into + Fortran-order arrays in C-order code) force the library to copy data into a contiguous + buffer before dispatching to numeric kernels or BLAS; a `.to_owned()` in a hot path is + often this copy surfacing. Keep arrays contiguous (`Array::as_standard_layout()`) for + hot kernels; check memory order (row-major C vs column-major F) against the operation's + access pattern; and enable the `blas` feature flag for ndarray to delegate linear-algebra + operations to a tuned BLAS (OPENBLAS, MKL) rather than the pure-Rust fallback (verify + against the currency brief for your version). + +- **Auto-vectorization that silently didn't happen**: the compiler auto-vectorizes inner loops + only when it can prove safety (no aliasing between input/output pointers, statically-known + bounds, the target ISA is enabled). Without `-C target-cpu=native` (or the equivalent + `target-feature` flags in `RUSTFLAGS`) the compiler targets the baseline ISA, leaving + AVX2/AVX-512/NEON disabled even on hardware that supports them. Bounds checks on indexed + access (`a[i]`) can also break the vectorizer's dependence analysis. Confirm vectorization + happened by inspecting the output of `cargo asm` / `cargo-show-asm` or LLVM IR — if SIMD + instructions are absent where expected, switch to explicit `std::simd` (portable-simd) or + the `wide` crate to guarantee vector width regardless of optimizer mood (cross-reference + Runtime & build notes in `../rust.md` for `-C target-cpu=native` guidance) (verify against + the currency brief for your version). + +- **Indexed access blocking vectorization in hot numeric loops**: `for i in 0..n { a[i] + b[i] }` + emits a bounds check on each access that the optimizer cannot always eliminate, breaking the + loop into scalar iterations or introducing conditional branches that prevent clean SIMD + lowering. Iterating over slices directly (`for (x, y) in a.iter().zip(b.iter())`) elides + bounds checks because the iterator carries its own length; `slice::chunks_exact(N)` gives + the optimizer a fixed-stride loop body with no remainder check inside the main loop — prefer + iterator forms and `chunks_exact` over manual index arithmetic in inner numeric kernels + (verify against the currency brief for your version). + +- **Adding threads to a memory-bandwidth-limited workload**: a loop that streams through a + large array with low arithmetic intensity (sum, copy, simple element-wise transform) is + bounded by DRAM or cache bandwidth, not by CPU compute. Throwing more rayon threads at it + saturates the memory bus faster but does not increase throughput — threads contend on the + same bandwidth budget and the wall time plateaus or regresses. Distinguish memory-bound + from compute-bound with a roofline estimate or a hardware-counter profiler (perf stat, + LIKWID) before reaching for parallelism; the payoff for parallelising memory-bound work + is rarely proportional to thread count (cross-reference the **Algorithmic complexity** + lane in `../rust.md`) (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/rust/database.md b/.claude/skills/performance-audit/profile-packs/rust/database.md new file mode 100644 index 00000000..e03ed547 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/rust/database.md @@ -0,0 +1,97 @@ +# Rust performance module: Database access (sqlx / diesel / sea-orm / tokio-postgres) +> Load when a Rust database layer is detected — `sqlx`, `diesel`, `sea-orm`, `tokio-postgres`, `deadpool`, `redis` — see the module map in `../rust.md`. Core lanes + Runtime & build notes live in `../rust.md`; this file is the Database access lens only. + +## Database access (sqlx / diesel / sea-orm / tokio-postgres) + +> Scope: all patterns touching `sqlx::Pool`, `diesel::r2d2::Pool`, `deadpool_postgres::Pool`, +> `sea_orm::DatabaseConnection`, or `redis::aio::MultiplexedConnection`. The recurring themes are: +> **share the pool** (it is a cheap `Arc` clone — build once, share everywhere), **batch to cut +> round-trips** (N+1 is the dominant latency killer in any Rust async service), **stream large +> results** rather than materialising them into a `Vec`, **keep transactions short** (a live +> transaction holds a pooled connection and DB locks for its entire lifetime), and **never block +> the executor with a sync driver**. Bullets are *conditions to look for*; cross-reference the +> core **Data access & I/O** and **Concurrency** lanes in `../rust.md` for the language-level +> analogues, the `../rust/async-tokio.md` sibling for executor-blocking footguns, and — where +> hand-written SQL is in scope — `../sql.md` plus its relevant dialect module. + +- **Pool built per-request or per-task instead of shared**: `sqlx::Pool`, `deadpool_postgres::Pool`, + and `diesel::r2d2::Pool` each embed an `Arc` — cloning the pool handle is the intended sharing + mechanism. Constructing a fresh pool per request bypasses the pool entirely, paying connection + establishment (TCP, TLS, auth, protocol handshake) on every call and leaking descriptors when + the pool is not explicitly closed. The signal to look for is `Pool::connect` / `Pool::new` / + `r2d2::Builder::build` called inside a handler, a `tokio::spawn` closure, or a per-request + function rather than at application startup (verify against the currency brief for your version). + +- **Pool limits left at defaults under real load**: `sqlx` defaults `max_connections` to 10 and + `min_connections` to 0; deadpool's default `max_size` is also small; r2d2 defaults to 10 max. + Under burst traffic the pool exhausts and callers queue (or timeout); raising it beyond the + database's own connection limit merely shifts the bottleneck and wastes server memory. Also look + for missing `idle_timeout` / `max_lifetime` settings — without them, idle connections persist + indefinitely and stale after a proxy or firewall reset (verify against the currency brief for + your version). + +- **N+1 in the Rust async idiom — queries inside loops or `join_all`**: issuing a `sqlx::query` + (or a sea-orm `find` / diesel `load`) per item — whether in a `for` loop, a `.map(|id| async + move { query… })` collected into `FuturesUnordered`, or a naïve `join_all` of per-item futures + — multiplies round-trips linearly with the result set. Replace with a single batched query + (`WHERE id = ANY($1)` with a `Vec` argument on Postgres, or `WHERE id IN (…)` on other + databases); for sea-orm/diesel relation loading, look for per-row `.find_related()` or + `.belonging_to()` calls that trigger a query per parent row instead of a single IN-batched load. + A `dataloader`-pattern crate can batch across concurrent callers (verify against the currency + brief for your version). + +- **`sqlx::query!` / `query_as!` build-time coupling vs. runtime flexibility trade-off**: the + compile-time macros verify SQL against a live database at compile time (requiring `DATABASE_URL` + in the environment) or against a cached schema snapshot via `sqlx prepare` / the `.sqlx/` + directory. This catches type mismatches and typos before runtime but couples every `cargo build` + to database availability and adds prepare round-trips to incremental build time. The runtime + `sqlx::query` / `query_as` variants skip the check. The condition to look for is a mismatch + between the team's constraint (CI without a live DB, fast incremental builds) and which form is + used — neither is universally better (verify against the currency brief for your version). + +- **Dynamic SQL strings defeating prepared-statement caching**: sqlx caches prepared statements + per connection using the query string as the cache key. A query whose shape is built with + `format!` — embedding variable table names, dynamic column lists, or values directly into the + string — produces a different key on every variation and forces a re-prepare cycle. The correct + pattern is a fixed query shape with `$1`, `$2`, … (Postgres) or `?` (MySQL/SQLite) bind + parameters; binding values through the parameter list also closes the SQL-injection surface. + Look for `format!("… WHERE id = {}", id)` passed to `sqlx::query` on any hot path (verify + against the currency brief for your version). + +- **Sync diesel blocking the async executor**: diesel's built-in interface is synchronous — a + diesel call inside a `tokio::spawn` or an `async fn` blocks the executor thread for the full + DB round-trip, starving other tasks on that thread. The remedies are: wrap with + `tokio::task::spawn_blocking`, use the `diesel-async` crate (which provides async-native + interfaces over the same diesel query builder), or migrate the data layer to sqlx/sea-orm. The + signal is a `diesel` import *and* an async runtime without any `spawn_blocking` boundary around + the DB calls (cross-reference `../rust/async-tokio.md` for the general executor-blocking lane; + verify against the currency brief for your version). + +- **`fetch_all` materialising large result sets into a `Vec`**: `sqlx::query().fetch_all(&pool)` + collects every matching row into a heap-allocated `Vec` before returning — on large exports, + paginated scans, or administrative queries this causes a memory spike proportional to the result + set. `fetch(&pool)` returns a `Stream` of rows that can be processed incrementally, bounding + memory to a single row (or a small read-ahead buffer). Look for `fetch_all` on queries without + a tight `LIMIT` on paths that could receive large or unbounded result sets (cross-reference the + core **Memory** lane in `../rust.md`; verify against the currency brief for your version). + +- **Transaction held across `.await` on external I/O or heavy computation**: a `sqlx::Transaction` + (or diesel `Connection` in a transaction) holds one connection from the pool and, on the + database side, holds row or page locks for its entire lifetime. Awaiting an HTTP call, a + message-queue publish, or a CPU-heavy step between `begin_transaction` and `commit` drains the + pool for other callers and extends lock duration. Look for `.await` on non-DB futures — or + unbounded iteration — between transaction begin and commit; restructure so external I/O happens + before or after the transaction, and ensure `rollback` is called on all error paths (a dropped + `sqlx::Transaction` rolls back implicitly but relying on `Drop` can obscure logic; prefer + explicit `commit`/`rollback`). Cross-reference the core **Concurrency** and **Data access** + lanes in `../rust.md` (verify against the currency brief for your version). + +- **redis-rs per-command round-trips and connection-per-call patterns**: issuing individual + `cmd("GET")` / `cmd("SET")` calls in a loop sends one network round-trip per command. Use + `redis::pipe()` (pipelining) or multi-key commands (`MGET` / `MSET`) to amortise latency. + Separately, opening a new connection per call (via `Client::get_connection` or + `Client::get_async_connection`) pays TCP/TLS overhead every time; prefer a + `MultiplexedConnection` (single connection, concurrent in-flight commands) or a pool via + `deadpool-redis` / `bb8-redis`. Also look for redis usage where the cached value is cheaper to + recompute locally than to serialise, send, receive, and deserialise over the wire — the + round-trip cost is not free (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/rust/serde-serialization.md b/.claude/skills/performance-audit/profile-packs/rust/serde-serialization.md new file mode 100644 index 00000000..46348fa1 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/rust/serde-serialization.md @@ -0,0 +1,80 @@ +# Rust performance module: Serialization (serde / serde_json / bincode / prost) +> Load when serde-based serialization is detected — `serde`, `serde_json`, `bincode`, `postcard`, `rmp-serde`, `prost`, `simd-json` — see the module map in `../rust.md`. Core lanes + Runtime & build notes live in `../rust.md`; this file is the Serialization lens only. + +## Serialization (serde / serde_json / bincode / prost) + +> Scope: `serde` derive machinery, `serde_json` (text), `bincode`/`postcard` (compact binary, +> Rust-to-Rust), `rmp-serde` / MessagePack (cross-language binary), `prost` / protobuf +> (schema'd, cross-language), and `simd-json`/`sonic-rs` (SIMD-accelerated JSON). The recurring +> theme is: borrow don't allocate (zero-copy where the lifetime fits), stream or reuse buffers +> rather than allocating per call, avoid structural choices (`flatten`/`untagged`/`Value`) that +> force a second parse pass, and match the wire format to the actual boundary — not every path +> needs JSON. + +- **`#[serde(borrow)]` with `&'de str`/`&'de [u8]` for per-field zero-copy**: the core pack + flags "borrowed `Deserialize<'de>`" as a win — the mechanism is `#[serde(borrow)]` on a + field typed `&'de str` or `&'de [u8]`, which causes serde to point directly into the input + buffer instead of allocating a new `String`/`Vec<u8>` per field. The trade-off is lifetime + coupling: the deserialized value cannot outlive the buffer. When ownership is only + *sometimes* needed, `Cow<'de, str>` avoids the unconditional clone while still permitting + owned construction — measure whether the allocation is measurable before adding the lifetime + complexity (verify against the currency brief for your version). + +- **`serde_json::from_reader` over an unbuffered source**: `from_reader` issues many small + reads against whatever `io::Read` it receives — over an unbuffered `File` or `TcpStream` + (both syscall-per-read by default) this multiplies syscall overhead; wrap in `BufReader` + first. Conversely, when the bytes are already in memory, `from_slice`/`from_str` avoids + the reader machinery entirely and is consistently faster than routing in-memory bytes + through `from_reader`. For output, `to_writer` streams into a `Write` target while + `to_string`/`to_vec` build the complete payload in a fresh allocation; the right choice + depends on whether the bytes need to exist as a whole before the next step + (verify against the currency brief for your version). + +- **`#[serde(flatten)]` and `untagged` enums force a buffered second pass**: `#[serde(flatten)]` + causes the deserializer to collect all fields into an intermediate representation (a content + map) and re-parse, defeating zero-copy and inserting an allocation + second traversal on + every call. `#[serde(tag = "...", content = "...")]` (adjacently-tagged) and `untagged` + enums have the same intermediate-buffer cost; externally- and internally-tagged enums avoid + it. Presence of `flatten` or `untagged` on a type used in a hot path is the signal — not + their presence in general (verify against the currency brief for your version). + +- **`serde_json::Value` and `arbitrary_precision` as allocation multipliers**: deserializing + into `Value` (a dynamic tree) allocates a heap node per JSON value; on large payloads or in + tight loops this accumulates quickly. If only a subtree is needed, decode the outer message + into a concrete struct with a `serde_json::RawValue` field and decode the inner part lazily + or not at all. Separately, enabling the `arbitrary_precision` feature changes number + handling and is slower than the default; number fields that flow into `f64` don't need it + (verify against the currency brief for your version). + +- **Allocating a fresh buffer on every serialize call**: calling `serde_json::to_vec` or + `to_string` in a per-request or per-message hot path allocates a new `Vec<u8>`/`String` + each time. Reuse a buffer: hold a `Vec<u8>` across calls, `buf.clear()` before each use, + and pass `&mut buf` via `serde_json::to_writer`; `with_capacity` pre-sizes on the first + call if a representative payload size is estimable. Cross-reference the **Memory** lane in + `../rust.md` (loop-body allocation / `clear()`-to-preserve-capacity pattern). + +- **`#[derive(Serialize, Deserialize)]` monomorphization on hot generic paths**: derive + generates a full implementation per concrete type; a generic function or struct + instantiated over many types produces one copy per instantiation — for serialization this + means separate codegen for each concrete `T`. This is usually the right trade-off, but a + hot generic deserializer fanned out over a large type set is a compile-time and binary-size + source worth profiling with `cargo bloat` or `twiggy`. Cross-reference the Runtime & build + notes in `../rust.md` (LTO, `codegen-units`) for the build-side levers. + +- **JSON vs a binary format for the actual boundary**: `serde_json` is human-readable, but + text parsing, UTF-8 validation, and base64 encoding of binary fields make it materially + slower and larger than the alternatives for non-human-facing boundaries. `bincode`/`postcard` + are compact and fast for Rust-to-Rust paths (no cross-language schema needed); `rmp-serde` + (MessagePack) is a compact cross-language option without a schema; `prost`/protobuf is + schema'd and well-suited for versioned cross-language contracts. Using `serde_json` for + internal cache payloads or service-to-service calls is the common footgun — verify the + format is matched to the boundary before optimizing within it + (verify against the currency brief for your version). + +- **`simd-json` / `sonic-rs` on measured JSON hot paths**: `simd-json` rewrites JSON parsing + using SIMD intrinsics and can be multiple times faster than `serde_json` on large payloads; + it requires a mutable, owned input buffer (it mutates the slice in place), which changes + call-site ownership. `sonic-rs` offers a similar gain with a somewhat different API surface. + Both add a non-trivial dependency and the benefit is payload-size-dependent — the signal + for reaching for either is a profiler trace showing JSON parsing as a top contributor, not + a parse anywhere in the call graph (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/rust/web.md b/.claude/skills/performance-audit/profile-packs/rust/web.md new file mode 100644 index 00000000..7d1636e8 --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/rust/web.md @@ -0,0 +1,75 @@ +# Rust performance module: Web frameworks (axum / actix-web / hyper) +> Load when a Rust HTTP server is detected — `axum`, `actix-web`, `warp`, `hyper`, `tower`/`tower-http` — see the module map in `../rust.md`. Core lanes + Runtime & build notes live in `../rust.md`; this file is the Web frameworks lens only. + +## Web frameworks (axum / actix-web / hyper) + +> Scope: the request path through axum, actix-web, warp, and the underlying hyper/tower stack. The +> recurring theme is: share state through `Arc` (not deep-clone), reuse connection pools and clients +> built at startup, keep the per-request extractor and middleware chain lean, stream rather than buffer +> large bodies and responses, and never block the async executor. Failures here compound linearly with +> concurrency — each footgun that costs 1 ms at 1 req/s costs 1 s of executor time at 1000 req/s. + +- **Application state cloned per-request without `Arc`**: axum's `State<S>` and actix-web's + `Data<T>` clone the inner value on every request dispatch. If `S`/`T` is a large struct that + derives `Clone`, every request performs a deep copy — config maps, client handles, caches and all. + The correct idiom is `State<Arc<AppState>>`/`Data<Arc<AppState>>`: the clone is a single atomic + refcount increment (verify against the currency brief for your version). + +- **HTTP client or connection pool built inside a handler**: constructing a `reqwest::Client`, a + database pool, or any resource that owns TCP connections inside a handler rebuilds the pool on + every request — paying TLS handshake and allocator cost each time. Build once at startup and share + via state; cross-reference the `database` module for pool-sizing guidance and the **Data access & + I/O** lane in `../rust.md` for the general missing-pooling signal (verify against the currency + brief for your version). + +- **Extractor ordering and the cost of body extraction**: axum and actix-web run extractors in + declaration order; a body extractor (`Json<T>`, `Bytes`, `String`) must buffer and deserialize the + entire request body before the handler is entered — cross-reference the `serde-serialization` + module for deserialization cost. Cheap rejection extractors (auth token, `Content-Type` guard, + content-length limit) should precede body extractors in the parameter list so malformed or + unauthorized requests are rejected before the expensive read occurs (verify against the currency + brief for your version). + +- **Tower middleware applied globally rather than scoped**: every `tower`/`tower-http` layer (tracing + span allocation, per-request auth DB lookup, compression, request logging) wraps every request that + reaches the router, including health checks and 404 paths. Heavy per-request work in a global layer + compounds at scale; scope layers to the specific route groups or services that need them using axum + `Router::layer` vs `Router::route_layer` semantics (verify against the currency brief for your + version). + +- **Buffering large request bodies or responses in memory**: reading an entire request body into + `Bytes` or `String` before processing, or assembling a large response `Vec<u8>` before writing, + spikes resident memory proportional to body size × concurrency. Use `axum::body::Body` streaming + /`StreamBody` for large uploads, chunked response bodies for large payloads, and configure a + `RequestBodyLimit` layer to bound maximum inbound allocation and prevent unbounded-allocation DoS + (verify against the currency brief for your version). + +- **Blocking or CPU-bound work executed directly in an async handler**: CPU-intensive work (image + transformation, cryptographic operations, large serialization batches) or synchronous I/O called + from inside `.await`-able handler code blocks the Tokio worker thread for the duration, starving + other tasks; cross-reference the **Concurrency** lane in `../rust.md` and the `async-tokio` module + — offload via `tokio::task::spawn_blocking` or hand off to a `rayon` pool (verify against the + currency brief for your version). + +- **actix-web's per-worker state duplication**: actix-web runs N independent single-threaded workers, + each initialized with its own copy of the app factory closure; `Data<T>` is internally an + `Arc<T>`, so pointer-sharing across workers is correct — but if the factory closure constructs + fresh resources (a new pool, a new in-memory cache) per worker rather than cloning an `Arc` built + once before `HttpServer::new`, each worker holds a separate, non-coordinated resource instance. + `!Send` types are permissible per-worker but cannot be shared; anything that must be shared across + workers needs `Arc`-wrapped thread-safe types (verify against the currency brief for your version). + +- **`Json(value)` response serialization on every hot response**: returning `Json(value)` in axum or + actix-web re-serializes the value on every response; for payloads that are static or infrequently + changing this is avoidable overhead — cross-reference the `serde-serialization` module for + serialization cost signals. Consider caching pre-serialized `Bytes` for reference data, applying + field projection/pagination to large collection responses, and measuring whether `simd-json` or + a pre-serialized pool wins on your hot path (verify against the currency brief for your version). + +- **Missing request timeouts and no keep-alive/HTTP2 consideration**: a hyper/tower server with no + timeout layer lets a slow or stalled client pin a task and its associated memory for an unbounded + duration; `tower-http`'s `TimeoutLayer` or `tower::ServiceBuilder` timeout bounds this. Separately, + HTTP/1.1 keep-alive and HTTP/2 multiplexing (available through hyper's native HTTP/2 support) + reduce per-request connection setup cost on high-fanout paths; verify that your deployment topology + allows each and that TLS configuration does not inadvertently disable negotiation (verify against + the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/sql.md b/.claude/skills/performance-audit/profile-packs/sql.md new file mode 100644 index 00000000..f9a5ea5d --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/sql.md @@ -0,0 +1,173 @@ +# Profile Pack: SQL (hand-written queries) + +A **companion** pack for **hand-rolled SQL** — queries, views, **stored procedures, functions, and +triggers** written by hand (not ORM-generated). It loads *alongside* the application's language pack +whenever hand-written SQL is material to the scope, and sharpens the same lanes for relational query +performance. ORM-specific footguns live in the language packs' data modules (`dotnet/sql-server-data.md`, +`python/orm-database.md`, `go/database-sql.md`, `javascript-typescript/node-data.md`); this pack is about +the SQL itself — **including the SQL hidden inside routines** (see "Routines" below; it is the easiest +to miss). + +**Assumes the schema (DDL) is available.** Reasoning about indexes, types, cardinality, and keys +requires the table/index definitions and ideally row-count statistics — when they are in scope, use +them; when they are not, drop confidence and say so. Signals below are durable and dialect-agnostic; +dialect specifics (PostgreSQL, T-SQL/SQL Server) load as modules — see the map at the bottom. Concrete +dialect features are tagged "(verify against the currency brief for your version)". + +--- + +## Algorithmic / query complexity (lane `algorithmic`) +- **Row-by-row (RBAR)** where a set-based statement would do: a cursor/`WHILE` loop, or a per-row + scalar function/round-trip, doing work the engine could express as one `UPDATE … FROM` / `INSERT … + SELECT` / `MERGE` over the whole set. +- **Non-sargable predicates**: wrapping an indexed column in a function or expression + (`WHERE lower(col)=…`, `WHERE col+0=…`, `WHERE date(ts)=…`), a leading-wildcard `LIKE '%x'`, or an + implicit type cast on the column side — each forces a scan instead of a seek. Move the transform to + the literal side, or index the expression. +- **Join fan-out before aggregation**: joining one-to-many and then aggregating multiplies rows the + engine must process (and can double-count) — filter/aggregate the many side first (subquery or + window) before joining, rather than `DISTINCT`/`GROUP BY` to paper over the explosion. +- **Correlated subquery per outer row** where a single `JOIN`, window function, or one grouped + aggregate would compute the value once — a SQL-shaped N+1 inside one statement. +- **Accidental Cartesian / missing join predicate**, and `OR` across different columns that defeats + any single index (often better as `UNION ALL` of sargable branches, or a rethought index). +- **Recomputed work**: the same derived table / subquery evaluated several times in one statement + where a CTE, temp table, or window computes it once. + +## Memory & intermediate results (lane `memory`) +- **Sorts / hash joins / aggregates that spill to disk** (`ORDER BY`, `GROUP BY`, `DISTINCT`, window, + merge/hash join over large inputs) without a supporting index or enough working memory — the spill, + not the logic, is the cost; an index that delivers rows in the needed order can remove the sort. +- **`SELECT *` / over-wide projection** pulling columns (especially large text/JSON/blob) the caller + never uses — inflates I/O, network, sort width, and memory grants. +- **Unbounded result sets / deep `OFFSET` pagination**: `OFFSET N` scans and discards N rows every + page; prefer keyset/seek pagination anchored on the last key. Missing `LIMIT`/`TOP` on exploratory + or list queries. +- **Materializing a huge intermediate** (temp table / CTE / derived table) that could be filtered + earlier or streamed, holding peak memory or tempdb for the whole statement. + +## Data access & indexing (lane `data-access`) +- **Missing index** on columns used in `WHERE` / `JOIN` / `ORDER BY` / `GROUP BY` — check the actual + DDL. For a composite index, column order is **equality predicates first, then the range/inequality, + then `ORDER BY` columns**; an index in the wrong order can't seek the query. +- **Key/heap lookups that should be covered**: a query that seeks a secondary index then fetches extra + columns row-by-row from the base table is a covering-index opportunity (include the projected + columns) — but weigh the added write/storage cost. +- **Too many / redundant / unused indexes**: every index is paid for on every `INSERT`/`UPDATE`/ + `DELETE`; duplicate or never-served indexes are pure write tax — recommend the *minimal* index that + serves the predicate and projection. +- **Stale statistics → wrong row estimates → wrong plan**: when the optimizer mis-estimates + cardinality it picks the wrong join type, order, or access method; the estimate-vs-actual gap in the + plan is the tell — refresh stats before blaming the query. +- **Type mismatch at the predicate** (column type ≠ literal/parameter type) forcing an implicit + conversion and a scan — sargability at the type level, easy to miss without reading the plan. +- **Over-fetching / late filtering**: returning rows the application then filters or counts, or + issuing one query per row from the app (the SQL side of the application `data-access` lane) — push + the filter/aggregate into the query. +- **Non-parameterized / ad-hoc SQL defeating plan reuse**: queries built by string-concatenating + literal values (`… WHERE id = 42`, a new literal every call) produce a distinct statement text each + time, so the engine compiles and caches a separate plan per literal — plan-cache bloat and repeated + compilation cost, and lost plan reuse. Parameterize (`WHERE id = $1` / `@id`); this is especially + common in *hand-rolled* SQL and is also the same defect as SQL injection — the durable fix serves + both (verify against the currency brief for your version — engines differ on forced/auto + parameterization). + +## Concurrency & locking (lane `concurrency`) +- **Long transactions holding locks** (and, under MVCC, holding back row-version cleanup): do external + calls, user think-time, and heavy computation *outside* the transaction; keep the write window + minimal. +- **Blocking chains & lock escalation**: a higher isolation level than the read actually needs, or + bulk DML escalating row→table locks, serializes concurrent access on hot tables — right-size the + isolation level and consider chunked DML. +- **Deadlocks from inconsistent lock ordering** across statements/procs — access tables/rows in a + consistent order and hold the fewest locks for the least time. +- **Readers blocking writers (or vice versa)** under pessimistic isolation where row-versioning / + snapshot isolation would let them not block — a real fix, but weigh the version-store cost + (verify against the currency brief for your version). +- **One giant DML statement** (delete/update millions) where chunked batches would bound lock + duration, transaction-log/WAL growth, and replication lag. + +## Framework / dialect-idiom currency (lane `idiom-currency`) +- Consult the version index/brief for the dialect — flag the slow hand-rolled equivalent of a feature + the engine now does better: window functions instead of self-joins, `MERGE`/upsert instead of + load-then-write, `FILTER`/conditional aggregation, lateral/`APPLY`, native JSON functions, + batch-mode/columnstore for analytics (verify against the currency brief for your version). +- Offline (no brief/index): note candidate idiom concerns at LOW confidence, flagged for manual + currency check. + +--- + +## Routines: stored procedures, functions & triggers (don't miss them) + +The query the application *runs* is often not in the application code. A `EXEC sp_DoWork @id`, a +`CALL process_order(...)`, or a plain `INSERT`/`UPDATE` that silently fires a **trigger** hands the +real, hand-rolled SQL off to a routine whose body lives in a schema/migration `.sql` file — and an +audit that reads only the app's data-access code **never sees it**. This is the single easiest place +for expensive hand-rolled SQL to hide. + +- **Follow the invocation into the definition.** Treat every `EXEC`/`CALL`/`SELECT … FROM + function(…)`/proc-name reference, and every DML against a table that has triggers, as a pointer into + a routine body — then audit that body with **all the lanes above** (the body is just SQL: it has its + own joins, indexes, sargability, cursors, locking). With the schema/DDL in scope (this pack assumes + it), the definitions are right there to read — read them, don't stop at the call site. +- **Triggers are invisible per-row work on every DML.** A row-level `AFTER`/`INSTEAD OF`/`BEFORE` + trigger that does a lookup, an audit-table insert, or a cascade runs *per affected row* on every + `INSERT`/`UPDATE`/`DELETE` — so a bulk operation that looks set-based becomes row-by-row, and the + cost appears nowhere in the calling statement. Find the triggers on hot tables and audit their + bodies; prefer statement-level / set-based trigger logic over per-row where the dialect allows + (verify against the currency brief for your version). +- **Routine-level N+1 and fan-out.** A proc/function invoked once per row from the app (or from inside + another routine — nested proc/function fan-out) is N+1 one level up; a function called in a + `SELECT`/`WHERE` runs its body per row (see the dialect modules' scalar-function bullets). The fix is + the same as any N+1: hoist the work into one set-based call. +- **Plans and parameters apply to routine bodies too.** Procedure plans are cached and sniffed, routine + bodies recompile, and a routine's SQL has its own statistics dependence — the dialect modules carry + the specifics (parameter sniffing, recompilation, function volatility/inlining). Don't assume a + routine is cheap because the call site is one line. + +--- + +## Reading the plan & schema (use for every SQL audit) + +SQL performance is judged against the **execution plan** and the **schema**, not the query text alone +— this is the SQL analog of a runtime-notes section: how to observe and measure before concluding. + +- **Get the *actual* plan, not just the estimate**: PostgreSQL `EXPLAIN (ANALYZE, BUFFERS)`, SQL + Server's actual execution plan + `SET STATISTICS IO, TIME ON`, run under representative data volume. + Estimated plans built on stale statistics mislead (verify against the currency brief for your + version). +- **Seek vs scan is a judgment, not a verdict**: a full scan is fine on a small or genuinely + unfiltered table and a problem on a large, selectively-filtered one — weigh the operator against the + table's row count and the predicate's selectivity, not the operator name. +- **Estimated vs actual rows is the highest-signal tell**: a large divergence means the optimizer is + guessing wrong (stale/missing stats, correlated columns it can't model, a non-sargable predicate), + so its join/order/memory choices downstream are probably wrong too. +- **Use the schema you have**: confirm which columns are actually indexed, the index column order, + the declared types (for sargability), the primary/clustering key, and approximate row counts before + recommending a change — and recommend the *minimal* index that serves the query, weighing its write + cost. +- **Confirm impact, don't assume it**: estimate rows examined vs returned; a fix that should turn a + scan into a seek must be validated against the new plan (and measured where possible). A hot region + that is inherent — a report that must aggregate the whole table — is not automatically a bug. + +## Framework / dialect modules (load on detection) + +Load the lanes + plan/schema notes above for *every* hand-written-SQL audit. Additionally load the +dialect module matching the target database. + +| Detected (signals) | Load module | +|---|---| +| **PostgreSQL** — `postgres`/`postgresql` driver or DSN, `psql`/`pg_dump` artifacts, Postgres syntax (`::type` casts, `RETURNING`, `jsonb`, `ON CONFLICT`, `ILIKE`) | [`sql/postgres.md`](sql/postgres.md) | +| **T-SQL / SQL Server** — `sqlserver`/`mssql` driver, `.sql` with `GO` batch separators, `[bracketed]` identifiers, `NVARCHAR`, `TOP`, `MERGE`, stored procedures | [`sql/tsql.md`](sql/tsql.md) | + +## Sources + +Durable signals here are grounded in vendor query-optimization documentation; dialect-specific facts +and per-entry citations belong in the dialect modules and (where built) a SQL version index. + +- **PostgreSQL** — "Using EXPLAIN", "Planner/Optimizer", "Index Types", "Routine Vacuuming", "Server + Configuration: Resource Consumption" (`work_mem`). +- **SQL Server** — "Query Processing Architecture Guide", "Execution plans", "SQL Server Index + Architecture and Design Guide", "Statistics", "Transaction Locking and Row Versioning Guide". +- **Relational fundamentals** — Use The Index, Luke (sargability, composite-index column order, + covering indexes); vendor pagination/keyset guidance. diff --git a/.claude/skills/performance-audit/profile-packs/sql/postgres.md b/.claude/skills/performance-audit/profile-packs/sql/postgres.md new file mode 100644 index 00000000..335b323e --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/sql/postgres.md @@ -0,0 +1,115 @@ +# SQL performance module: PostgreSQL +> Load when the SQL dialect is PostgreSQL (`postgres`/`postgresql` driver or DSN, `psql`/`pg_dump` artifacts, Postgres-specific syntax like `::type` casts, `RETURNING`, `jsonb`, `ON CONFLICT`) — see the module map in `../sql.md`. Dialect-agnostic SQL lanes live in `../sql.md`; this file is the PostgreSQL lens only. + +## PostgreSQL + +> Scope: hand-rolled queries against a PostgreSQL backend where the schema (DDL) is available for +> reasoning about indexes, types, and cardinality. Dialect-agnostic fundamentals (missing index on +> filter/sort columns, SELECT * over-fetch, correlated-subquery N+1, sargability in general, set-based +> vs cursor, keyset pagination, reading EXPLAIN in general) are owned by the **Data access** lane in +> `../sql.md` — this file specialises to Postgres-distinctive realities only. The recurring themes are: +> **MVCC bloat and vacuum** (dead tuples accumulate silently and degrade every scan until vacuumed), +> **the right index type** (Postgres offers more index kinds than most engines — pick the one that +> matches the data shape), **reading `EXPLAIN (ANALYZE, BUFFERS)`** (estimated vs actual row counts and +> buffer hits reveal the actual cost), **`work_mem` spills** (sorts and hash joins that exceed the +> per-operation budget land on disk), and **the process/pooler model** (each backend is a heavyweight +> OS process — connection count is a first-class resource). + +- **MVCC bloat and autovacuum falling behind**: every `UPDATE` or `DELETE` leaves dead tuple versions + in the heap; bloated tables and indexes pay that dead-tuple I/O on every scan. Long-running + transactions hold back the oldest `xmin` horizon and can block autovacuum from cleaning any later + rows across the whole table — a single idle-in-transaction connection can freeze cleanup + cluster-wide. For tables with high churn, check whether autovacuum cost parameters or + `vacuum_freeze_min_age` have been tuned, and whether `fillfactor < 100` is set to leave room for + HOT updates (HOT avoids writing new index entries when no indexed column changes, a major win for + frequently-updated rows) (verify against the currency brief for your version). + +- **`EXPLAIN (ANALYZE, BUFFERS)` signals beyond the plan shape**: a large gap between *Estimated Rows* + and *Actual Rows* means statistics are stale — run `ANALYZE` on the table and check + `pg_stat_user_tables.last_analyze`. `Rows Removed by Filter` on a Seq Scan or Index Scan node + indicates a non-sargable or unindexed predicate doing post-fetch filtering. `Buffers: shared + read` vs `hit` reveals whether data is coming from disk or cache; `temp read`/`written` signals an + on-disk spill (see the `work_mem` bullet). A `Bitmap Heap Scan` after a `Bitmap Index Scan` is + normal for range or multi-condition queries but has a heap-recheck cost absent from a plain Index + Scan — evaluate which is cheaper given selectivity (verify against the currency brief for your + version). + +- **Index-only scans blocked by a stale visibility map**: a covering index (or a query projecting only + indexed columns) enables an index-only scan that never touches the heap — but Postgres still checks + the visibility map to confirm tuple visibility. Pages dirtied by recent writes are marked + "not all-visible" and force a heap fetch anyway, degrading to an effective Index Scan. Regular + `VACUUM` updates the visibility map; on write-heavy tables an index-only scan may never be clean + without explicit tuning. Also check that multicolumn index column ORDER places equality predicates + before range predicates — a `(status, created_at)` index serves `WHERE status = 'open' AND + created_at > $1` but the reverse order does not (cross-reference the **Data access** lane in + `../sql.md` for general index-column-order fundamentals). + +- **Wrong index type for the data shape**: Postgres provides index types beyond B-tree that the planner + will only use when explicitly created. A `WHERE active = true` on a column that is `true` for 0.1% + of rows is a candidate for a **partial index** (`CREATE INDEX … WHERE active = true`) — far smaller + and faster than an index on the full column. Predicates on `lower(email)` or any computed expression + require an **expression index** on that exact expression. `jsonb`/array membership and full-text + predicates need a **GIN** index; range types and geometric data need **GiST**; huge + naturally-ordered append-only tables (event logs, time-series) can use a tiny **BRIN** index + instead of a B-tree. The `INCLUDE` clause on a B-tree adds non-key columns for covering without + widening the index key (verify against the currency brief for your version). + +- **`work_mem` spills to disk on sorts, hash joins, and hash aggregates**: each sort, hash join, or + hash aggregate operation gets its own `work_mem` budget (a single query with multiple such nodes + multiplies it). When the operation exceeds the budget, Postgres writes temp files — visible in + `EXPLAIN ANALYZE` as `Sort Method: external merge Disk` or `Batches: N` on a Hash node. A + session-level `SET work_mem` bump before an analytics-heavy query is the targeted fix; a + cluster-wide increase must account for `max_connections × nodes_per_query × work_mem` as a + worst-case memory ceiling. Conversely, a `work_mem` that's adequate individually can cause OOM + under high concurrency (verify against the currency brief for your version). + +- **CTE materialization fences and planner visibility**: before Postgres 12, every `WITH` clause was + an optimization fence — materialized once, results opaque to the planner, preventing predicate + pushdown and join reordering. Postgres 12+ inlines simple non-recursive CTEs unless `MATERIALIZED` + is explicitly specified. Legacy queries written for the fence behavior (using CTEs intentionally to + force a step) may silently change plan when run on 12+ without `MATERIALIZED`; conversely, + pre-12-era code that assumed inlining will not get it. Audit CTEs for which behavior is intended, + and whether the current version delivers it. Also flag `LATERAL` joins and `DISTINCT ON` as + Postgres-idiomatic alternatives to correlated subqueries and window-function patterns that may + deserve a plan check (verify against the currency brief for your version). + +- **`NOT IN` with a nullable subquery, and OR-across-columns index defeat**: `NOT IN (SELECT col …)` + returns zero rows if any value in the subquery is NULL — a silent correctness and performance trap. + Prefer `NOT EXISTS` which handles NULLs correctly and typically enables an efficient anti-join. + Separately, `WHERE a = $1 OR b = $2` across two differently-indexed columns usually forces a Seq + Scan because a single index can't satisfy both branches; a `UNION ALL` of two indexed queries or a + multicolumn index strategy is the usual fix (cross-reference the **Data access** lane in `../sql.md` + for general sargability). Also note `= ANY(ARRAY[…])` as the Postgres idiom for `IN (…)` over a + parameter array — both are index-compatible with the same B-tree. + +- **Process-per-connection model and connection pooling**: each Postgres backend is a forked OS process + (not a thread), carrying its own memory and overhead. High connection counts directly compete for + shared memory, file descriptors, and lock table entries — `max_connections` is a hard ceiling, not + a soft limit. At any meaningful concurrency a connection pooler (PgBouncer in transaction mode is + the standard) is near-mandatory to multiplex application threads onto a smaller pool of backends. + Also: prepared statements switch from a custom plan (optimized for the first execution's parameter + values) to a generic plan after roughly 5 executions; for queries with highly skewed data + distributions, a generic plan can be dramatically worse than a custom one — `plan_cache_mode` lets + you force custom plans where needed (verify against the currency brief for your version). + +- **Function volatility and row-level triggers — the planner reads volatility**: a PL/pgSQL or SQL + function marked `VOLATILE` (the default) is re-evaluated for every row and is a planner optimization + barrier — it cannot be folded into an index condition or hoisted. A function that is genuinely + `STABLE` or `IMMUTABLE` should say so: only then can Postgres use it in an index scan's condition or + call it once instead of per row, and only `IMMUTABLE` functions can back an expression index. Plain + SQL functions (vs PL/pgSQL) can also be *inlined* by the planner when simple. Separately, **row-level + triggers** (`FOR EACH ROW`) fire per affected row on bulk DML — a `FOR EACH STATEMENT` trigger (using + transition tables) is often the set-based alternative. Check declared volatility on functions used in + predicates, and whether hot tables carry per-row triggers doing lookups or cascades (verify against + the currency brief for your version). + +- **Data-type storage costs and UUIDv4 index fragmentation**: `jsonb` (binary, indexable, detoast on + read) vs `json` (stored as text, re-parsed every read) — prefer `jsonb` for any queried or indexed + JSON. Values wider than ~2 KB are automatically TOAST-ed out-of-line; queries that repeatedly + detoast large `text`/`jsonb` columns (e.g. selecting a wide column in a high-frequency loop) pay + decompression cost even when only a sub-key is needed — consider storing frequently-accessed + sub-keys in their own typed columns. Random UUIDv4 primary keys insert at random B-tree positions, + causing frequent page splits, poor cache locality, and index bloat; sequential keys (UUIDv7, + `bigint`/`serial`, or `gen_random_uuid()` on v4 where inserts are low-frequency) avoid this. The + `numeric` type is arbitrary-precision but significantly slower than `bigint` or `double precision` + for arithmetic-heavy queries (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/sql/tsql.md b/.claude/skills/performance-audit/profile-packs/sql/tsql.md new file mode 100644 index 00000000..6d98163b --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/sql/tsql.md @@ -0,0 +1,34 @@ +# SQL performance module: T-SQL (Microsoft SQL Server) +> Load when the SQL dialect is T-SQL / Microsoft SQL Server (`sqlserver`/`mssql` driver, `.sql` with `GO` batch separators, `[bracketed]` identifiers, `NVARCHAR`, stored procedures, `TOP`, `MERGE`) — see the module map in `../sql.md`. Dialect-agnostic SQL lanes live in `../sql.md`; this file is the T-SQL lens only. + +## T-SQL (Microsoft SQL Server) + +> Scope: hand-rolled T-SQL queries and stored procedures where the schema (DDL, indexes, column types, statistics) is available. The recurring themes are: **parameter sniffing** (the cached plan is shaped by the first execution's values, not all values), **sargability and implicit type conversion** (a column-side conversion silently kills a seek), **covering indexes and key lookups** (nonclustered indexes that don't cover all needed columns force per-row heap/clustered lookups), **reading the *actual* execution plan** (estimated vs actual row counts reveal stale statistics and plan regressions), and **isolation / tempdb pressure** (lock escalation, `NOLOCK` misuse, and tempdb as a shared bottleneck). Bullets are *conditions to look for*; cross-reference the dialect-agnostic **Data access & I/O** and **Concurrency** lanes in `../sql.md` for the generic analogues. + +- **Parameter sniffing — cached plan built from an atypical first execution**: SQL Server compiles a stored procedure or parameterised batch once and caches the plan shaped by the parameter values present at *that* compilation. If the first execution used an atypical value (a rare `@CustomerId` with 1 row vs the common case with 100 000 rows), the cached plan is wrong for the majority of executions — symptom is a proc that is fast sometimes and slow other times, or slow after a plan cache flush. Mitigations include `OPTION (RECOMPILE)` (per-execution compile, no caching cost amortised), `OPTIMIZE FOR (@p = <representative value>)` or `OPTIMIZE FOR UNKNOWN` (fix the assumed cardinality), assigning the parameter to a local variable before use (defeats sniffing at the cost of always estimating from the histogram average), or SQL Server 2022 Parameter Sensitive Plan (PSP) optimization which maintains multiple sub-plans per query (verify against the currency brief for your version). + +- **Implicit conversion defeating an index seek — `varchar`/`nvarchar` mismatch**: when an indexed `varchar` column is compared to an `nvarchar` parameter (the default type for `N'...'` literals, many drivers, and most ORMs), SQL Server must apply an implicit `CONVERT` *to the column side* to reconcile collation precedence, turning a potential seek into a full scan. The actual execution plan surfaces this as a yellow-triangle **implicit conversion warning** on the predicate operator. Fix by matching the parameter type exactly to the column type (check column DDL, then parameter declarations and driver-side type bindings). The same trap applies to `int`/`bigint` mismatches and `varchar(n)` width differences that force truncation-safe widening (verify against the currency brief for your version). + +- **Key lookups on nonclustered indexes that don't cover all needed columns**: a nonclustered index seek that satisfies the `WHERE` predicate but cannot supply all columns in the `SELECT` or `ORDER BY` forces a **Key Lookup** (RID Lookup on a heap) into the clustered index *per qualifying row*. A small row estimate makes this look cheap in the estimated plan; under actual cardinality the loop is expensive. Look for `Key Lookup` operators in the actual plan, check the output column list on the lookup, and evaluate whether adding those columns via `INCLUDE` on the nonclustered index (making it covering) or narrowing the `SELECT` list eliminates the lookup. The core `../sql.md` covers index seeks vs scans in general; this is the SQL Server-specific mechanism (verify against the currency brief for your version). + +- **Clustered key design — wide, random, or volatile clustering keys bloating every index**: every nonclustered index on the table stores the clustered index key as its row locator, so a poor clustering choice (a `UNIQUEIDENTIFIER` generated by `NEWID()` — random GUIDs) bloats *all* nonclustered indexes proportionally, causes severe page fragmentation and split-heavy write workloads, and increases Key Lookup costs. Prefer a narrow, static, ever-increasing clustering key (`IDENTITY`/`INT`/`BIGINT`, or `NEWSEQUENTIALID()` for GUID requirements) so nonclustered indexes stay compact and inserts are append-like. Fill factor and fragmentation on hot tables matter especially under a random clustering key — review alongside the DDL (verify against the currency brief for your version). + +- **Estimated vs actual rows — stale statistics causing plan regressions**: SQL Server's auto-update statistics threshold is a fixed percentage of rows changed (roughly 20% for smaller tables, scaling to a smaller fraction for very large tables via trace flag 2371 or the default SQL Server 2016+ dynamic threshold), so large tables can go significantly stale between auto-updates. A large gap between *Estimated Rows* and *Actual Rows* in the actual execution plan is the diagnostic signal — read the actual plan, not the estimated plan, for any problematic query. Plan warnings also flag tempdb sort/hash **spills** (a sort or hash join ran out of the memory grant and spilled to disk), the implicit conversion warnings noted above, and missing-index suggestions (treat those as a lead to investigate, not gospel — they ignore existing index overlap and write cost) (verify against the currency brief for your version). + +- **Scalar UDFs executing per row and blocking parallelism**: a scalar user-defined function called in a `SELECT` list or `WHERE` clause executes once per row and historically forces the query plan to run serially (no parallelism) and hides its cost from the optimizer's row-cost estimate — the function body's I/O and CPU are invisible to the plan. This compounds badly on large row counts. SQL Server 2019 introduced T-SQL scalar UDF inlining, which rewrites qualifying UDFs inline as relational expressions and restores optimizer visibility and parallelism eligibility — but inlining has eligibility requirements (no side effects, no external access, no recursion, specific T-SQL constructs only). Flag scalar UDFs on hot queries, check whether inlining applies (`sys.sql_modules.is_inlineable`), and consider rewriting as inline table-valued functions (`ITVF`) for guaranteed inlining on any version (verify against the currency brief for your version). + +- **Cursors and `WHILE`-loop RBAR where a set-based statement fits**: `CURSOR` and `WHILE`-loop row-by-row processing (RBAR) in stored procedures — iterating over a result set to `UPDATE`/`INSERT`/`DELETE` one row at a time — is a classic SQL Server performance sink because each iteration incurs locking, logging, and round-trip overhead. A single set-based `UPDATE ... FROM`, `DELETE ... FROM`, or `MERGE` statement lets the optimizer choose a bulk plan, parallelise, and batch log writes. **Table variables** used as intermediate staging sets have historically reported an estimated 1 row to the optimizer (no per-row statistics), causing bad join and aggregation plans on large sets; `#temp` tables get statistics and are generally better for intermediate sets of meaningful size. SQL Server 2019+ deferred compilation narrows (but does not eliminate) the table-variable statistics gap (verify against the currency brief for your version). + +- **`WITH (NOLOCK)` / `READ UNCOMMITTED` as a "performance" fix**: `NOLOCK` is widely used to avoid blocking under contention, but it permits **dirty reads** (uncommitted data), **phantom reads**, **duplicate rows**, and **missing rows** caused by in-progress page splits — a correctness hazard, not a safe speed knob. The root cause of reader/writer contention under the default lock-based `READ COMMITTED` is that readers block behind writers. The correct SQL Server remedy is enabling **Read Committed Snapshot Isolation (RCSI)** at the database level, which serves readers from the row-version store in tempdb and eliminates reader/writer blocking without dirty-read risk. Flag `NOLOCK` use in production query code; note whether RCSI is already enabled before recommending the change (verify against the currency brief for your version). + +- **Triggers running hidden per-row work on every DML**: an `AFTER`/`INSTEAD OF` trigger fires once per + statement but its `inserted`/`deleted` pseudo-tables hold *all* affected rows — trigger logic written + with a cursor or a correlated per-row lookup turns a set-based `INSERT`/`UPDATE`/`DELETE` into + row-by-row work that is invisible in the calling statement's plan. Nested/recursive triggers + (`nested triggers` / `RECURSIVE_TRIGGERS` settings) compound it, and a trigger that itself updates + another triggered table fans out. Audit triggers on hot tables: confirm the body is set-based over + `inserted`/`deleted`, watch for triggers that call procs or write audit rows per execution, and check + whether a constraint, computed column, or change-tracking feature would do the job without a trigger + (verify against the currency brief for your version). + +- **tempdb as a shared bottleneck — spills, version store, and contention**: heavy sort and hash-join operations that exceed their memory grant **spill to tempdb** (visible as spill warnings in the actual plan); table variables, `#temp` tables, cursors, CTEs with multiple references that materialise, and the RCSI row-version store all share tempdb. On systems with many concurrent sessions this creates **allocation-page contention** (PFS/GAM/SGAM pages) if tempdb has too few data files or is on a slow volume. `SELECT INTO` a permanent or temp table under concurrency contends on tempdb as well. `MAXDOP` and `cost threshold for parallelism` left at legacy defaults (MAXDOP 0 / CTP 5) can over-parallelize small queries (spawning parallel workers that thrash tempdb) or under-parallelize large batch queries — both require measurement against the actual workload (verify against the currency brief for your version). diff --git a/.claude/skills/performance-audit/profile-packs/swift.md b/.claude/skills/performance-audit/profile-packs/swift.md new file mode 100644 index 00000000..d8b2fdec --- /dev/null +++ b/.claude/skills/performance-audit/profile-packs/swift.md @@ -0,0 +1,74 @@ +# Profile Pack: Swift + +Specializes the generic lanes for Apple-platform Swift (SwiftUI/UIKit, Core Data/SwiftData, Xcode/SwiftPM) and server Swift (Vapor). Signals below are durable idioms; volatile version details live in the currency brief / version index, not here. + +--- + +## Algorithmic complexity & data structures (lane `algorithmic`) +- `Array.contains(_:)` / `firstIndex(where:)` called inside a loop over a second collection — accidental O(n²); replace the inner lookup with a `Set` or `Dictionary` keyed on the relevant field. +- Existential `any Protocol` in hot loops: dynamic dispatch + heap boxing on every call; prefer constrained generics (`some P` or `<T: P>`) where the concrete type is knowable at the call site (verify against the currency brief for your version). +- `String` is not integer-indexable in O(1) — subscripting by `Int` offset requires walking grapheme clusters; offset-arithmetic loops over `String` are O(n²); use `String.Index` iteration, `Substring` slicing, or convert to `[Character]` / UTF-8 bytes once. +- Repeated pure computations inside loops that depend only on loop-invariant values — hoist before the loop or cache in a local `let`; applies equally to computed properties accessed in tight render/update cycles. +- Re-sorting or re-filtering the same collection on every data-read or view-update; sort/filter once on input change and store the result. + +## Memory & allocation (lane `memory`) +- ARC retain/release overhead on reference types inside hot loops — consider passing `inout` or using value types; each assignment to a `class` instance increments a reference count. +- Retain cycles in closure captures: `self` captured strongly by a long-lived callback, timer, or notification handler; use `[weak self]` or `[unowned self]` capture lists and confirm the object's lifetime before choosing `unowned`. +- Copy-on-Write (CoW) semantics of `Array`, `Dictionary`, `Set`, and `String`: a mutation on a shared buffer triggers a full copy; the hidden performance bug is passing a collection `inout` or assigning it through a non-uniquely-referenced path — check that the buffer is uniquely referenced before mutating. +- Large `struct` values copied repeatedly on assignment or as function arguments — consider `class` semantics, an `inout` parameter, or splitting into a reference-typed backing store for the mutable part. +- `reserveCapacity(_:)` on `Array`/`Dictionary`/`String` when the final size is known — avoids repeated geometric reallocation (verify against the currency brief for your version). +- Foundation bridging toll: implicit `NSArray`/`NSString`/`NSDictionary` ↔ Swift bridging in hot loops allocates intermediary objects; prefer pure-Swift types and defer bridging to the call boundary. +- Missing `autoreleasepool { }` around tight Objective-C-interop loops — Objective-C autoreleased objects accumulate in the run-loop pool until the loop exits; wrap the loop body to bound peak memory (verify against the currency brief for your version). + +## Data access & I/O (lane `data-access`) +- Core Data N+1: iterating fetched objects and triggering fault resolution per item instead of using `fetchBatchSize` and `relationshipKeyPathsForPrefetching` to prefetch relationships in bulk; look for `for obj in results { _ = obj.relationship }` patterns. +- SwiftData equivalent: accessing a lazy relationship on each element of a `@Query` result in a loop without a prefetch descriptor — same N+1 pattern, different API surface (verify against the currency brief for your version). +- `JSONDecoder` / `JSONEncoder` allocated fresh on every hot-path call; both types are expensive to create — allocate once and reuse, or use a pool; also check for unnecessary `Data` copies before decoding. +- Main-thread file I/O or synchronous `NSManagedObjectContext` fetch on the main context — blocks the UI thread; move to a background context (`performBackgroundTask`) or `async` fetch. +- Over-fetching: Core Data `NSFetchRequest` returning full objects (all attributes) when only one or two fields are needed — set `resultType` to `NSDictionaryResultType` with `propertiesToFetch` for read-only aggregation. +- `URLSession` task created per request rather than reusing a shared session — loses connection pooling, TLS session resumption, and HTTP/2 multiplexing; create one session (or a small set by configuration) and reuse it (verify against the currency brief for your version). + +## Concurrency & parallelization (lane `concurrency`) +- **Exploit:** sequential `await` of independent async operations in a function body — replace with `async let` bindings or `withTaskGroup` / `withThrowingTaskGroup` to run concurrently; verify independence (no shared mutable state, no ordering dependency) before parallelizing. +- **Exploit:** `AsyncSequence` / `AsyncStream` available but code buffers full results into an array before processing — pipeline item-by-item with `for await` to reduce peak memory and improve time-to-first-result. +- **Defend:** heavy CPU or I/O work dispatched directly on `@MainActor` (or the main `DispatchQueue`) — move it off-main via an `actor`, a detached `Task`, or `Task.detached(priority:)` and only marshal UI updates back. +- **Defend:** blocking the Swift cooperative thread pool with synchronous work (long loops, `Thread.sleep`, `DispatchSemaphore.wait`) inside an `async` context — cooperative threads are not OS threads; blocking them starves other async tasks. +- **Defend:** excessive actor hops: calling across actor boundaries for each item in a loop — batch the work inside a single actor method rather than hopping per-element. +- **Defend:** `DispatchQueue.sync` from a queue into itself (deadlock risk) or `.concurrent` queue with shared mutable state (data race); audit `DispatchQueue` usage when mixing GCD with Swift Concurrency. +- **Defend:** parallelizing without verifying `Sendable` conformance — confirm shared values are either value types with no mutable state or actors before using `withTaskGroup`; non-`Sendable` types shared across task boundaries are data-race risks (verify against the currency brief for your version). + +## Framework-idiom currency (lane `idiom-currency`) +- Consult the version index and currency brief. Flag patterns the brief marks superseded/deprecated (e.g., `ObservableObject`/`@Published` where `@Observable` is available; `DispatchQueue`-based concurrency where Swift Concurrency actors/tasks are the fast path; legacy `NSFetchedResultsController` patterns vs modern SwiftData); flag fast-path APIs the index lists that the code doesn't use; flag changed defaults the code still fights. +- Offline (no brief): note candidate idiom concerns at LOW confidence, flagged for manual currency check. + +## Payload / startup / build (lane `payload-startup`) +- `+load` methods, static initializers, and `__attribute__((constructor))` C functions run before `main()` during dyld startup — any expensive work here (I/O, network, large allocations) directly increases cold-start time; audit for slow `+load` in Objective-C categories. +- Whole-Module Optimization (WMO) and cross-module optimization disabled in the release build configuration — WMO enables cross-function inlining and dead-code removal that is impossible with per-file compilation; verify the Xcode/SwiftPM release config enables it (verify against the currency brief for your version). +- Binary size / dead-code stripping: unused code linked into the final binary increases cold-start load time on Apple platforms; ensure linker dead-strip and Swift whole-module optimization are both enabled for release. +- Expensive synchronous work in `application(_:didFinishLaunchingWithOptions:)` or `@main` `init` — database migration, network calls, large JSON parsing — blocks the first frame; defer to background tasks or lazy initialization. +- Large or unoptimized asset catalogs: uncompressed images or assets included in the app bundle that are never loaded at startup still inflate the binary and slow initial dyld mmap; audit with the build report. +- Dynamic framework linking adds a dyld load time cost per framework; consolidating rarely-used dynamic frameworks or preferring static linking reduces pre-`main` time (verify linker settings against the currency brief for your version). + +--- + +## Framework notes + +### SwiftUI +- Unnecessary `body` re-evaluation from observable objects with broad invalidation scope: a single `@ObservedObject` / `@StateObject` whose any property changes re-renders the entire view tree — split into smaller observed objects or migrate to `@Observable` for fine-grained property-level tracking (verify against the currency brief for your version). +- Misuse of `@StateObject` vs `@ObservedObject`: `@StateObject` creates and owns the object (created once per view identity); `@ObservedObject` borrows it from outside — using `@ObservedObject` where `@StateObject` is intended causes re-creation on every parent render, losing state and wasting allocations. +- Expensive or side-effectful work inside `body` — network calls, large computations, sorting — executes on every SwiftUI rendering pass; move to `task {}`, `.onAppear`, or a view model; `body` must be a pure, fast function of its inputs. +- Missing `LazyVStack` / `LazyHStack` / `LazyVGrid` for large or unbounded lists — `VStack` eagerly materializes all child views; replace with lazy equivalents or `List` (which is lazy by default) when rendering more than ~50 items. +- Unstable view identity from volatile `.id()` modifier or index-as-identity: changing an element's identity forces SwiftUI to destroy and recreate the full subtree (animations break, state resets); use a stable, persistent identifier. +- Over-broad `@Environment` or `@EnvironmentObject` scope: a high-level environment value that changes frequently invalidates all descendant views that read it; narrow the scope or use a more targeted observable (verify against the currency brief for your version). +- `EquatableView` / `View.equatable()`: wrapping a view whose inputs rarely change prevents re-evaluation when the parent re-renders and `Equatable` confirms equality — use where the view's `Equatable` conformance is cheap and the body re-evaluation is demonstrably costly (verify against the currency brief for your version). + +--- + +## Sources + +Durable signals in this pack are grounded in these authoritative sources (version-specific facts and +their per-entry citations live in `../version-indexes/swift.md`): + +- swift.org — release blogs (Swift 5.5–6.2), "Announcing Swift 6", migration guide +- Swift Evolution proposals — SE-0390 (`~Copyable`), SE-0381 (`DiscardingTaskGroup`), SE-0412, SE-0423 +- Apple Developer — Observation (`@Observable`, WWDC23 s10149), SwiftData, Core Data `fetchBatchSize` diff --git a/.claude/skills/performance-audit/run-schema.md b/.claude/skills/performance-audit/run-schema.md new file mode 100644 index 00000000..b926d17c --- /dev/null +++ b/.claude/skills/performance-audit/run-schema.md @@ -0,0 +1,100 @@ +# Run Schema (historical & regression analysis) + +**Load this when:** writing the consolidated report in Phase 3, so each run is captured in a +**versioned, machine-readable** form that supports trend lines and run-over-run regression diffs. + +`run_schema_version` is the version of THIS schema. Bump it when the structure changes; parsers gate +on it. (Current: **1**.) + +## Three artifacts per run + +1. **Frontmatter** on the consolidated markdown report (human- and machine-readable). +2. **One appended line** in `docs/perf-audits/runs.jsonl` (the longitudinal ledger). +3. **A fingerprint on every finding** in the report body, so runs can be diffed. + +## 1. Consolidated-report frontmatter + +```yaml +--- +run_schema_version: 1 +run_id: <YYYY-MM-DDThh-mm>-<slug> # unique; matches the report filename stem +date: <ISO 8601 UTC, e.g. 2026-06-03T14:30:00Z> +scope: "<scope string>" +methodology: + skill: performance-audit + plugin_version: superpowers-plus@<version from plugin.json> +dispatch: + # Record what the runner REQUESTED at dispatch — NOT a self-reported model identity + # (an agent cannot reliably introspect its own model id). If the user overrode, say so. + model_requested: "<e.g. latest-opus | gpt-5-successor | user-override:<name>>" + reasoning_effort: "<e.g. x-high | high | default | 'default (harness exposes no knob)'>" + overridden_by_user: <true|false> +stack: + - { ecosystem: <npm|pypi|nuget|go|crates|maven>, framework: <name>, version: <x.y.z> } +currency_briefs: + - { framework: <name>, researched_on: <YYYY-MM-DD|null>, status: <fresh|stale|refreshed|offline> } +lanes_run: [algorithmic, memory, data-access, concurrency, idiom-currency, cost-map] +lanes_skipped: { payload-startup: "<reason>", dynamic: "<reason>" } +finding_counts: + by_impact: { critical: <n>, major: <n>, minor: <n> } + by_lane: { algorithmic: <n>, memory: <n>, data-access: <n>, concurrency: <n>, idiom-currency: <n>, payload-startup: <n> } + suspected_bugs: <n> +regression: + prev_run_id: <run_id of the most recent prior run for the SAME scope, or null> + new: <n> # fingerprints present now, absent in prev + persisting: <n> # in both + resolved: <n> # in prev, absent now +--- +``` + +## 2. `docs/perf-audits/runs.jsonl` ledger + +Append exactly one JSON object per run (newline-delimited). Same fields as the frontmatter, +flattened, plus the finding fingerprints. One line = one run → trivially greppable/plottable: + +```json +{"run_schema_version":1,"run_id":"2026-06-03T14-30-checkout","date":"2026-06-03T14:30:00Z","scope":"the request pipeline","plugin_version":"superpowers-plus@0.2.0","model_requested":"latest-opus","reasoning_effort":"x-high","overridden_by_user":false,"stack":[{"ecosystem":"pypi","framework":"django","version":"5.0.2"}],"lanes_run":["algorithmic","memory","data-access","concurrency","idiom-currency","cost-map"],"finding_counts":{"by_impact":{"critical":1,"major":3,"minor":4},"by_lane":{"algorithmic":2,"memory":2,"data-access":1,"concurrency":1,"idiom-currency":2},"suspected_bugs":1},"regression":{"prev_run_id":null,"new":8,"persisting":0,"resolved":0},"fingerprints":["algorithmic:inventory.py:find_duplicate_skus:on2-dedup","data-access:inventory.py:enrich_line_items:n-plus-1"]} +``` + +The ledger is the regression substrate: `jq` / `grep` over it yields "critical count over time", +"runs where finding X recurred", "first run a finding appeared", etc. + +## 3. Finding fingerprints (stable across runs) + +Every finding in the report body carries a **fingerprint** so the same issue can be matched run to +run even as the report text changes: + +``` +fp = "<lane-id>:<repo-relative-file>:<symbol-or-anchor>:<short-title-slug>" + +where `<lane-id>` is the lane SLUG (algorithmic, memory, data-access, concurrency, +idiom-currency, cost-map, payload-startup, dynamic) — never a bare number. +``` + +- Use the **function/method/symbol** name (or a stable structural anchor) — **NOT a line number**; + line numbers drift between runs and would break matching. +- `short-title-slug` = lowercased, hyphenated 2–4 word gist (e.g. `n-plus-1`, `on2-dedup`, + `unmemoized-render-sort`). +- Show it inline, e.g. `**Fingerprint:** data-access:inventory.py:enrich_line_items:n-plus-1`. + +## Regression diff (how the runner computes it) + +In Phase 3, after assigning fingerprints, the runner SHOULD: +1. Find the most recent prior ledger entry with the **same `scope`** (read `runs.jsonl`). +2. Compare fingerprint sets: `new` = now − prev, `resolved` = prev − now, `persisting` = now ∩ prev. +3. Record those counts in the frontmatter + ledger, and call out **new** and **resolved** findings + in the report's executive summary (these are the regression signal a reader most wants). + +If there is no prior run for the scope, `prev_run_id: null` and all findings are `new`. + +## Honesty constraints +- `model_requested` records the **dispatch request**, never a guessed model identity. +- `reasoning_effort` records the **requested** effort. If the harness exposes no effort knob (e.g. it + lets you set the subagent model but not an effort level), record `"default (harness exposes no + knob)"` — do not claim `x-high` you couldn't actually request. +- `plugin_version` comes from the plugin's `plugin.json`. If the skill was **vendored flat** (copied + into a project's `.claude/skills/` without the `plugin.json`), the version isn't locally available — + record the known value with its provenance (e.g. `superpowers-plus@<version> (vendored; version per + source repo)`) rather than inventing one. +- Never fabricate counts — they MUST equal what the synthesis actually produced. +- If the ledger can't be written (read-only FS), note it in the report; do not silently skip. diff --git a/.claude/skills/performance-audit/test-fixtures/README.md b/.claude/skills/performance-audit/test-fixtures/README.md new file mode 100644 index 00000000..f3aaaf6a --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/README.md @@ -0,0 +1,90 @@ +# Performance-audit evals & test fixtures + +How the `performance-audit` skill is validated. These are **LLM behavioral evals**, not deterministic +unit tests: each "run" dispatches a subagent and is scored by hand against a rubric. They are +**re-runnable on demand** — a directional signal you (or a future maintainer) invoke when the packs or +prompts change — **not a CI gate**. Dispatch and scoring are deliberately manual; this doc is the +how-to. + +> **Why not automate / why not a fixture-per-module matrix?** See the decisions log Part Z (overload +> assessment) and Part DD. A 40-fixture matrix would rot, cost tokens on every change, and — worst — +> create a gradient that tunes the packs into checklists that pass fixtures. The goal is **every +> ecosystem represented once, every cross-cutting behaviour tested once**, with the eval *rigged to +> reward findings the pack didn't list* so it can't quietly erode the "a lens should sharpen a clever +> agent, not constrain a strong one" principle. + +## Two kinds of eval + +1. **Behavioural / discipline tests** (`behavioral/`) — ecosystem-*independent*. They test the + machinery (`finding-model.md`, `lane-prompts.md`, the dispatch in `SKILL.md`), so they do **not** + multiply per ecosystem. Each is a RED/GREEN scenario: the agent's behaviour with the relevant skill + text (GREEN) vs. without it (RED). This is where the highest-value, lowest-maintenance coverage + lives. +2. **Pack recall/precision fixtures** (`<ecosystem>-sample/`) — ecosystem-*specific*. A small, realistic + sample app that naturally triggers the core lanes + the Runtime/Variant notes + 1–3 modules, seeded + with documented perf issues. One fixture **per ecosystem**, not per module. + +## The rubric (every fixture has an `expected-findings.md`) + +Score a run on three axes — and note that the third is what protects the design philosophy: + +- **Recall** — of the **planted issues** (each maps to a real, reachable perf problem). Target: all of + them. *Recall is measured over performance findings and performance-*related* bugs only — a missed + pure-correctness bug is never a recall miss (that's `bug-hunt-cycle`'s job).* +- **Precision** — the **decoys** (cold-path / bounded-tiny-n / not-actually-a-problem near-misses) must + **not** be flagged. A decoy reported as a finding is a precision failure. Decoys should be baited to + tempt a *checklist-walker* — a near-miss for a pack idiom that doesn't actually apply here. +- **Beyond-the-pack (floor-not-ceiling)** — a planted real issue whose fix is **not spelled out as a + named idiom in the loaded pack slice**, so the agent must reason from first principles rather than + pattern-match a bullet. Finding it is a **bonus that rewards out-reasoning the lens**; *missing it is + not counted against recall.* But a run that finds every bulleted issue and **consistently** misses the + beyond-the-pack one across dispatches is a signal the pack is being walked as a checklist — the most + important thing this suite watches for. + +Optional: **honeypot correctness bugs** test the `bug-no-chase` boundary (a bug is in-scope only when +the incorrect behaviour *is* the slowness; otherwise record to the Suspected Bugs appendix and move on). +See `python-sample/expected-findings.md` for the canonical example of all of these. + +## How to run a fixture (manual) + +For each lane you want to exercise, dispatch one subagent with **only**: +1. the **shared preamble** + that **lane body** from `../lane-prompts.md` (fill the placeholders); +2. the **profile-pack slice** for that lane — the lane-keyed section of the matched pack(s), **plus the + pack's cross-cutting Runtime/Variant-notes section** (and a companion pack's *Reading the plan & schema* + / *Rendering path & CWV*) as shared context, **plus** any module relevant to the lane (per `SKILL.md` + Phase 0 — load only *material* modules); +3. the **currency brief** (or the fixture's `currency-brief.md`, or "unavailable — offline"); +4. the **fixture path** as the scope. + +**Do not let the subagent read `expected-findings.md`.** Collect its findings, then score recall / +precision / beyond-the-pack against the rubric. Record outcomes (a dated table in the decisions log is +the convention — see Parts D and DD). + +> **Structural checks** (no subagent needed): confirm the assembled lane prompt actually **includes the +> Runtime/Variant-notes section** (the dispatch wording in `SKILL.md` Phase 2 + `lane-prompts.md` line 27 +> requires it — this is easy to drop because that section isn't lane-keyed); confirm `SKILL.md` body +> < 500 lines and the description < 1024 chars; confirm one-level-deep references resolve. + +## Coverage map + +| Fixture | Ecosystem / shape | Lanes exercised | Last run | +|---|---|---|---| +| `python-sample/` | Python stdlib | 1–4 + honeypots + beyond-the-pack | GREEN (Part D) | +| `django-sample/` | Python + Django | 5 (idiom-currency) | with-brief + offline-degrade | +| `react-sample/` | JS/TS + React | 1,2,7 (cost-map) | component-render footguns | +| `behavioral/reference-not-checklist/` | ecosystem-independent | machinery | **GREEN** 2026-06-04 | +| `behavioral/materiality.md` | ecosystem-independent | Phase 0 | **GREEN** 2026-06-04 | +| `go-sample/` | Go + net-http-servers + database-sql | algo/mem/data/conc + Runtime notes | **GREEN** 2026-06-04 | +| `rust-sample/` | Rust + web + async-tokio + database | mem/data/conc + Runtime notes | **GREEN** 2026-06-04 | +| `sql-sample/` | SQL companion + Postgres + **Routines** | algo/mem/data | **GREEN** 2026-06-04 | +| `html-sample/` | HTML companion + images-media + fonts | payload/CWV | **GREEN** 2026-06-04 | +| `dotnet-sample/` | .NET + aspnet-core + sql-server-data | data/mem/conc + Variant notes | **GREEN** 2026-06-04 | + +## Honest limitations + +- Non-deterministic and token-costly; treat as directional signal, not pass/fail truth. Run a + *representative subset* per change, not the whole suite every time. +- Tests are typically dispatched on **Sonnet** (a stricter "typical executor" bar than the Opus the + skill recommends) — real dispatch should do at least as well. +- Live currency-brief research isn't network-tested here; the offline-degrade path is exercised + (`django-sample` offline run), the live-fetch path is reasoned-about only. diff --git a/.claude/skills/performance-audit/test-fixtures/behavioral/materiality.md b/.claude/skills/performance-audit/test-fixtures/behavioral/materiality.md new file mode 100644 index 00000000..ecf74971 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/behavioral/materiality.md @@ -0,0 +1,48 @@ +# Behavioural eval: "materiality decides the load, not mere presence" + +**Property under test:** the Phase-0 rule in `SKILL.md` — *"Detection selects candidates; materiality +decides the load … a lone `import json` / `import asyncio` that is peripheral to the scoped code does +not by itself warrant the serialization or async module."* This guards against over-loading a lane +agent's prompt with modules irrelevant to the actual scope. + +**No code fixture needed — it's a Phase-0 detection scenario.** (Optionally point it at a real repo.) + +## How to run + +Dispatch a subagent with **only** the `SKILL.md` **Phase 0** section (the detection table + the +sub-stack-modules rule + the materiality sentence) and this scenario, and ask: *"Which profile pack(s) +and sub-stack module(s) do you load, and why?"* Do not show it the expected loadout below. + +### Scenario + +> **Audit scope:** `pricing/calc.py` — a CPU-bound pricing-calculation module (nested rate tables, +> tier math). Profile/optimize this file. +> +> **Repo facts:** `requirements.txt` lists `fastapi`, `sqlalchemy`, `pydantic`, `orjson`. `calc.py` +> itself imports only `math` and `json` (the latter used **once at import time** to load a static +> rate-table config file). The web handlers and DB models live in *other* packages not in this scope. + +## Expected loadout (GREEN) + +| Pack / module | Load? | Why | +|---|---|---| +| `python.md` core + Runtime & interpreter notes | **Yes** | the scoped code is Python | +| `python/serialization.md` | **No** | the only `json` use is a one-time startup config read — *incidental*, not the hot path under audit; `orjson` in `requirements.txt` is used elsewhere, not in scope | +| `python/web-frameworks.md` | **No** | `fastapi` is a repo dep but the scoped file has no web surface; web is not material to `calc.py` | +| `python/orm-database.md` | **No** | `sqlalchemy` is a repo dep but the scoped file does no DB access | +| `python/async-asyncio.md` | **No** | no async in scope | + +**Pass = loads the Python core (+ Runtime notes) and NONE of the four modules**, with the reasoning +that materiality (not the presence of a dep in `requirements.txt` or an incidental `import json`) +decides the load. **Fail (RED, without the materiality rule)** = loads `serialization` on the `json` +import and/or `web-frameworks`/`orm-database` because the deps are in the manifest. + +> Variant: change the scope to "audit the FastAPI request handlers in `api/routes.py` that serialize +> large responses" — now `web-frameworks` and `serialization` **are** material and SHOULD load. The +> rule is scope-relative, not a fixed per-repo answer. + +## Result log + +| Date | Model | Loaded core only? | Spuriously loaded a module? | Verdict | +|---|---|---|---|---| +| 2026-06-04 | Sonnet | ✅ (python core only) | No — skipped all 6 with correct materiality reasoning (`json` flagged as the closest call, correctly rejected as a one-time import-time config read) | **GREEN** | diff --git a/.claude/skills/performance-audit/test-fixtures/behavioral/reference-not-checklist/orders.py b/.claude/skills/performance-audit/test-fixtures/behavioral/reference-not-checklist/orders.py new file mode 100644 index 00000000..3909d496 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/behavioral/reference-not-checklist/orders.py @@ -0,0 +1,63 @@ +"""Order utilities. A behavioural-eval fixture for the 'reference, not a checklist' +property. Mostly-fine code with tempting near-misses for pack idioms, ONE genuine +perf issue, and ONE beyond-the-pack issue. See spec.md (do not read it as the agent).""" + +from collections import Counter + +# A small, fixed set of valid statuses — module-level constant, not request data. +VALID_STATUSES = ["new", "paid", "shipped", "closed"] + + +class Money: + """Few instances ever created (one per currency, at startup).""" + def __init__(self, amount, currency): + self.amount = amount + self.currency = currency + + +def is_valid_status(status): + """CHECKLIST BAIT (decoy): `in` membership against a LIST inside a function the + pack's algorithmic bullet warns about — BUT VALID_STATUSES is a constant of 4 + items and this is not in a loop. O(4) is not a finding. A checklist-walker + 'recommends a set'; calibration says ignore.""" + return status in VALID_STATUSES + + +def status_breakdown(orders): + """CHECKLIST BAIT (decoy): builds a list comprehension then passes it on. A + walker flags 'use a generator to avoid the intermediate list' — but Counter + consumes it once and the list is small (one pass, bounded). Not a finding.""" + statuses = [o["status"] for o in orders] + return Counter(statuses) + + +def dedupe_order_ids(order_ids): + """GENUINE PLANTED ISSUE (recall item, Lane 1 — algorithmic): membership test + `in seen` against a LIST inside the loop is O(n) per check → O(n^2) overall over + request-sized `order_ids`. `seen` should be a set. This one MUST be found.""" + seen = [] + out = [] + for oid in order_ids: + if oid in seen: + continue + seen.append(oid) + out.append(oid) + return out + + +def process_in_arrival_order(tasks): + """BEYOND-THE-PACK ISSUE (floor-not-ceiling bonus): treats a `list` as a FIFO + queue via `pop(0)`, which is O(n) per pop (shifts every remaining element) → + O(n^2) to drain. The fix is `collections.deque` + `popleft()`. NO Python-pack + bullet names this; the agent must reason from first principles that list.pop(0) + is O(n). Finding it rewards out-reasoning the lens; missing it is NOT a recall + miss, but consistently missing it across runs flags checklist-walking.""" + results = [] + while tasks: + task = tasks.pop(0) # O(n) shift on every iteration + results.append(_handle(task)) + return results + + +def _handle(task): + return {"id": task.get("id"), "ok": True} diff --git a/.claude/skills/performance-audit/test-fixtures/behavioral/reference-not-checklist/spec.md b/.claude/skills/performance-audit/test-fixtures/behavioral/reference-not-checklist/spec.md new file mode 100644 index 00000000..6b758e06 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/behavioral/reference-not-checklist/spec.md @@ -0,0 +1,40 @@ +# Behavioural eval: "a reference, not a checklist — a floor, not a ceiling" + +**Property under test:** the consumer-side framing in `lane-prompts.md` (shared preamble: *"THE +PROFILE-PACK LENS IS A REFERENCE, NOT A CHECKLIST … a PRIOR not a worklist, a FLOOR not a ceiling … +do NOT report an item merely because the pack lists it … never limit your investigation to what the +pack names … out-reason it"*). This is the highest-value behavioural guarantee in the skill; this test +checks both halves: **(a)** don't fabricate findings for pack idioms that don't apply, and **(b)** find +a real issue the pack didn't name. + +**Scope:** `orders.py`. **Lane:** `algorithmic` (the `memory` lane works too). + +## How to run + +- **GREEN run (primary):** dispatch an `algorithmic` lane subagent with the shared preamble (which + contains the reference-not-checklist framing) + the `algorithmic` lane body + the **Python pack** + `algorithmic` slice + the path to `orders.py`. Do not let it read this spec. +- **RED run (control, optional):** same, but **strip the reference-not-checklist paragraph** from the + preamble. Expect more fabricated "consider using a set / a generator / `__slots__`" findings on the + decoys, and/or no engagement with `process_in_arrival_order`. + +## Scoring + +| Function | Category | GREEN expectation | +|---|---|---| +| `dedupe_order_ids` | **Recall** (genuine O(n²)) | **Found** — flagged as accidental quadratic; `set` fix. Missing it = recall failure. | +| `is_valid_status` | **Decoy** (constant n=4, not looped) | **Not flagged** (or explicitly considered + rejected on bounded-n grounds). Flagging "use a set" = precision/checklist failure. | +| `status_breakdown` | **Decoy** (bounded one-pass list) | **Not flagged.** "Use a generator" here is a checklist-walk; the intermediate is small and consumed once. | +| `Money` / `__slots__` | **Decoy** (few instances) | **Not flagged.** "Add `__slots__`" with a handful of instances is a checklist-walk with no aggregate impact. | +| `process_in_arrival_order` | **Beyond-the-pack** (`list.pop(0)` → `deque`) | **Bonus if found** (reasoned that `pop(0)` is O(n); `collections.deque`). NOT a recall miss if absent — but consistent misses across runs ⇒ checklist-walking signal. | + +**Pass = GREEN run flags `dedupe_order_ids`, fabricates ZERO decoy findings (ideally states it +considered and rejected them), and ideally surfaces `process_in_arrival_order` by reasoning.** +The discriminating signal vs. a checklist-walker is the *decoys staying silent* and the +*beyond-the-pack issue being engaged*. + +## Result log + +| Date | Model | Recall (dedupe) | Decoys fabricated | Beyond-the-pack found | Verdict | +|---|---|---|---|---|---| +| 2026-06-04 | Sonnet | ✅ | 0 (explicitly considered + rejected all 3, naming "checklist-walking") | ✅ (`pop(0)`→`deque`, reasoned from CPython list internals) | **GREEN** | diff --git a/.claude/skills/performance-audit/test-fixtures/django-sample/currency-brief.md b/.claude/skills/performance-audit/test-fixtures/django-sample/currency-brief.md new file mode 100644 index 00000000..56ad0b48 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/django-sample/currency-brief.md @@ -0,0 +1,34 @@ +--- +schema_version: 1 +framework: django +ecosystem: pypi +researched_against_version: 5.0.x +latest_known_at_research: 5.0.x +researched_on: 2026-06-03 +fallback_ttl_days: 180 +sources: + - https://docs.djangoproject.com/en/5.0/ref/models/querysets/ + - https://docs.djangoproject.com/en/5.0/releases/ +--- + +> HAND-AUTHORED for the Lane 5 fixture test. In real use this file is produced by the +> currency-protocol research step; here it is the brief the workhorse would pass to Lane 5. + +## Superseded patterns (old → new) +- `len(queryset)` / `bool(queryset)` / `if queryset:` to test existence → `queryset.exists()`. + `len()` executes the query and instantiates every row; `.exists()` issues a cheap `SELECT 1 ... LIMIT 1`. +- `QuerySet.extra(select=..., where=...)` raw SQL fragments → `annotate()` with ORM expressions + (`F`, `Value`, `ExpressionWrapper`, database functions). `.extra()` is long-deprecated, a + SQL-injection/maintenance hazard, and excluded from query-planner optimizations. +- Per-object `.save()` in a loop over the same field set → `QuerySet.bulk_update(objs, ["field"])` + (one statement instead of N). + +## New fast-path APIs (and the version that introduced them) +- `QuerySet.bulk_create(..., update_conflicts=True, unique_fields=..., update_fields=...)` — native upsert. +- Async ORM: `aget()`, `acount()`, `async for` over querysets for async views. + +## Changed defaults +- (none relevant to this fixture) + +## Known perf regressions / fixes by version +- (none relevant to this fixture) diff --git a/.claude/skills/performance-audit/test-fixtures/django-sample/expected-findings.md b/.claude/skills/performance-audit/test-fixtures/django-sample/expected-findings.md new file mode 100644 index 00000000..2411a660 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/django-sample/expected-findings.md @@ -0,0 +1,40 @@ +# Expected Findings — Django (Lane 5) fixture + +**Purpose:** exercise **Lane 5 (framework-idiom currency)** — the lane the stdlib Python fixture +can't reach because it has no framework/brief. The planted issues are *correct* code that a newer +framework version supersedes; they are identifiable as problems ONLY by consulting the currency +brief (`currency-brief.md`), not by generic algorithmic/IO reasoning. + +`views.py` is illustrative Django (not executed). + +## How to run + +**With-brief run (primary):** dispatch a Lane 5 agent with the shared preamble + Lane 5 body from +`../../lane-prompts.md`, the `javascript-typescript.md`/`python.md` pack Lane 5 slice (here: +`python.md`), the **contents of `currency-brief.md`** as the `[currency brief]` placeholder, and +`views.py` as the scope (do NOT let it read this rubric). + +**Offline run (degrade test):** same, but pass `[currency brief]` = "unavailable — offline". Expect +the lane to report candidate idiom concerns at **LOW confidence**, flagged for manual currency +check, and to **NOT fabricate** version-specific claims. + +## Planted issues (with-brief run should find) + +| # | File:func | Brief entry it maps to | Expected | +|---|-----------|------------------------|----------| +| 1 | `views.has_recent_orders` | `len(queryset)` → `.exists()` | flag the `len(qs) > 0` existence check; recommend `.exists()` | +| 2 | `views.order_net_amounts` | `.extra()` deprecated → `annotate()` | flag `.extra(select=...)`; recommend `annotate()` | +| 3 | `views.mark_all_shipped` | per-object `.save()` in loop → `bulk_update()` | flag the loop of `.save()`; recommend `bulk_update()` | + +## Decoy (should NOT be flagged) + +| File:func | Why ignored | +|-----------|-------------| +| `views.active_admin_emails` | plain comprehension over a tiny fixed list; no ORM, nothing in the brief applies. Flagging a "currency" issue here is a precision failure. | + +## Scoring + +- **With-brief recall** = (# of {1,2,3} found) / 3, each citing the brief entry. +- **Precision** = decoy not flagged; no fabricated version claims. +- **Offline run** = issues (if mentioned) carry LOW confidence + "manual currency check"; no + confident version-specific assertions invented without the brief. diff --git a/.claude/skills/performance-audit/test-fixtures/django-sample/views.py b/.claude/skills/performance-audit/test-fixtures/django-sample/views.py new file mode 100644 index 00000000..efbc95e5 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/django-sample/views.py @@ -0,0 +1,57 @@ +"""Representative Django ORM patterns (illustrative — NOT executed; no Django install needed). + +This fixture exercises Lane 5 (framework-idiom currency): the planted issues are +correct code that a newer framework version supersedes — identifiable as problems +ONLY by consulting the currency brief (see currency-brief.md), not by generic +algorithmic/IO reasoning. + +Assume `Order` and `User` are standard Django models with a default manager. +""" + + +def has_recent_orders(user_id): + """Does the user have any recent orders? + + PLANTED LANE 5 ISSUE #1 (superseded idiom): uses `len(queryset)` to test + existence, which executes the query AND instantiates every matching row just + to check for >0. The currency brief flags `.exists()` as the fast path. The + code is *correct* — only the idiom is stale. + """ + qs = Order.objects.filter(user_id=user_id, status="recent") + return len(qs) > 0 + + +def order_net_amounts(user_id): + """Net amount (amount - discount) per order. + + PLANTED LANE 5 ISSUE #2 (deprecated API): uses `QuerySet.extra()` with a raw + SQL fragment. The brief flags `.extra()` as deprecated in favor of + `annotate()` with ORM expressions. Works today; deprecated path. + """ + return Order.objects.filter(user_id=user_id).extra(select={"net": "amount - discount"}) + + +def mark_all_shipped(order_ids): + """Mark a batch of orders shipped. + + PLANTED LANE 5 ISSUE #3 (new fast-path not used): saves each object in a loop. + The brief notes `QuerySet.bulk_update()` as the framework fast path for exactly + this. (Overlaps Lane 3, but the *currency* angle is "the framework now offers + bulk_update for this pattern".) + """ + orders = Order.objects.filter(id__in=order_ids) + for o in orders: + o.status = "shipped" + o.save() + + +def active_admin_emails(): + """Normalized admin emails. + + DECOY (the brief does NOT cover this): a plain comprehension over a tiny fixed + in-process list — no ORM, nothing version-specific. Lane 5 must NOT invent a + currency issue here; nothing in the brief applies. Flagging it is a precision + failure. + """ + config_admins = ["Admin@Example.com", "Ops@Example.com"] + return [e.lower() for e in config_admins] diff --git a/.claude/skills/performance-audit/test-fixtures/dotnet-sample/OrdersController.cs b/.claude/skills/performance-audit/test-fixtures/dotnet-sample/OrdersController.cs new file mode 100644 index 00000000..21459b5b --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/dotnet-sample/OrdersController.cs @@ -0,0 +1,77 @@ +// .NET fixture for the performance-audit evals: an ASP.NET Core + EF Core controller +// exercising the core .NET lanes + the aspnet-core and sql-server-data modules + +// Variant notes. Illustrative (not built). See expected-findings.md (do NOT read it +// as the agent under test). + +using Microsoft.AspNetCore.Mvc; +using Microsoft.EntityFrameworkCore; + +[ApiController] +[Route("orders")] +public class OrdersController : ControllerBase +{ + private readonly ShopContext _db; + public OrdersController(ShopContext db) => _db = db; + + [HttpGet("summary")] + public async Task<IActionResult> Summary() + { + // PLANTED #1 (data-access / sql-server-data): EF N+1 — the related Customer is + // accessed per row inside the loop without an Include/projection, firing one + // query per order. + var orders = await _db.Orders.Where(o => o.Status == "paid").ToListAsync(); + var lines = new List<string>(); + foreach (var o in orders) + { + var name = o.Customer.Name; // lazy nav → one SELECT per order (N+1) + // PLANTED #2 (memory/algorithmic): string built with += in a loop → O(n^2) + // allocation; use a StringBuilder. + string line = ""; + line += o.Id + ","; + line += name + ","; + line += o.TotalCents; + lines.Add(line); + } + return Ok(lines); + } + + [HttpGet("report")] + public IActionResult Report() + { + // PLANTED #3 (data-access / sql-server-data): client-side evaluation — the whole + // table is materialized with ToList() and THEN filtered/projected in memory, + // instead of pushing the Where/Select to SQL. Also fetches all columns. + var all = _db.Orders.ToList(); + var paid = all.Where(o => o.TotalCents > 0) + .Select(o => new { o.Id, o.TotalCents }) + .ToList(); + + // PLANTED #4 (concurrency / Variant notes): sync-over-async blocks a thread-pool + // thread and can deadlock under the legacy sync context; await it instead. + var count = _db.Orders.CountAsync().Result; + + return Ok(new { paid, count }); + } + + // BEYOND-THE-PACK (floor-not-ceiling): exceptions used for control flow INSIDE a + // per-item loop. Throwing/catching is expensive in .NET (stack capture); on a hot + // path this dominates. Validate with TryParse / a guard instead. NO .NET-pack + // bullet names exception-as-control-flow cost — the agent must reason it. + public int SumValidQuantities(IEnumerable<string> raw) + { + int sum = 0; + foreach (var s in raw) + { + try { sum += int.Parse(s); } // throws on every non-numeric item + catch (FormatException) { /* skip */ } + } + return sum; + } + + // DECOY (should NOT be flagged): a LINQ query over a fixed 3-element in-memory list, + // built once. Mirrors the "materialize then filter" shape but n=3 and it's not on a + // hot path. Flagging it ("push to SQL", "avoid ToList") is a precision/checklist + // failure — there is no database and n is trivially bounded. + private static readonly string[] Regions = { "us", "eu", "apac" }; + public bool RegionAllowed(string r) => Regions.Where(x => x == r).Any(); +} diff --git a/.claude/skills/performance-audit/test-fixtures/dotnet-sample/expected-findings.md b/.claude/skills/performance-audit/test-fixtures/dotnet-sample/expected-findings.md new file mode 100644 index 00000000..c75048c5 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/dotnet-sample/expected-findings.md @@ -0,0 +1,47 @@ +# Expected Findings — .NET fixture (core + aspnet-core + sql-server-data) + +**Purpose:** exercise the .NET core lanes + the `aspnet-core` and `sql-server-data` modules + the +**Variant notes** (Modern vs Framework). Illustrative (not built). + +**Pack slice to provide:** `dotnet.md` lane slices + the **Variant notes** section + `dotnet/aspnet-core.md` ++ `dotnet/sql-server-data.md`. Scope = `OrdersController.cs`. Do NOT let the agent read this rubric. + +## Planted issues (should be found) + +| # | Location | Lane / module | Issue | +|---|----------|---------------|-------| +| 1 | `Summary` loop (`o.Customer.Name`) | data-access / `sql-server-data` | **EF N+1**: lazy navigation accessed per row; use `Include`/projection | +| 2 | `Summary` loop (`line += …`) | memory / algorithmic | string `+=` in a loop → O(n²) allocation; `StringBuilder` | +| 3 | `Report` (`_db.Orders.ToList()` then `.Where`) | data-access / `sql-server-data` | **client-side evaluation** — materialize-then-filter instead of pushing `Where`/`Select` to SQL; also over-fetches columns | +| 4 | `Report` (`.CountAsync().Result`) | concurrency / Variant notes | **sync-over-async** blocks a thread-pool thread / deadlock risk; `await` it | + +## Beyond-the-pack (floor-not-ceiling — bonus) + +| Location | Issue | Why beyond the pack | +|----------|-------|---------------------| +| `SumValidQuantities` | `try/catch (FormatException)` per item in a loop — exceptions as control flow | Throwing/catching captures a stack and is expensive in .NET; on a hot path it dominates. `int.TryParse` avoids it. No .NET-pack bullet names exception-as-control-flow cost — requires reasoning. | + +## Decoy (should NOT be flagged) + +| Location | Why ignored | +|----------|-------------| +| `RegionAllowed` | LINQ `.Where(...).Any()` over a fixed 3-element static array — mirrors the materialize-then-filter shape but n=3, no DB, cold. "Push to SQL"/"avoid ToList" here is a precision/checklist failure. (A sharp agent may note `.Any(x => x == r)` is marginally cleaner, but that's a style note, not a perf finding.) | + +## Scoring + +- **Recall** = (# of {1..4} found) / 4. +- **Precision** = `RegionAllowed` decoy not flagged as a perf finding; no fabricated findings. +- **Beyond-the-pack** = the exception-as-control-flow loop flagged → out-reasons the lens. + +## How to run + +Dispatch lane subagents (data-access, memory, concurrency) with the shared preamble + lane body from +`../../lane-prompts.md`, the `dotnet.md` slices + Variant notes + the two modules, and +`OrdersController.cs` as scope. Score against the tables above. + +## Last run + +**2026-06-04, Sonnet — GREEN.** Recall 4/4 (also caught the missing `AsNoTracking()` + sync action +method within #3); beyond-the-pack (exception-as-control-flow) found and flagged as not-in-the-pack; +`RegionAllowed` decoy rejected as bounded/cold; `AsSplitQuery`/`IAsyncEnumerable` candidates correctly +ruled inapplicable; zero fabrications. diff --git a/.claude/skills/performance-audit/test-fixtures/go-sample/expected-findings.md b/.claude/skills/performance-audit/test-fixtures/go-sample/expected-findings.md new file mode 100644 index 00000000..fc5f505f --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/go-sample/expected-findings.md @@ -0,0 +1,50 @@ +# Expected Findings — Go fixture (core + net-http-servers + database-sql) + +**Purpose:** exercise the Go core lanes + the `net-http-servers` and `database-sql` modules + the +Runtime & GC notes, with recall / precision / beyond-the-pack scoring. Illustrative Go (not built). + +**Pack slice to provide:** `go.md` lane slices + the **Runtime & GC notes** section + (material to this +scope) `go/net-http-servers.md` and `go/database-sql.md`. Do NOT let the agent read this rubric. + +## Planted issues (should be found) + +| # | Location | Lane / module | Issue | +|---|----------|---------------|-------| +| 1 | `service.go` `HandleOrder` (per-item loop) | data-access / `database-sql` | **N+1**: one `QueryRow` per item; should be one `WHERE id = ANY($1)` batch | +| 2 | `service.go` `HandleOrder` (`&http.Client{}`) | data-access / `net-http-servers` | **http.Client built per request** (no keep-alive/pool reuse); `resp.Body` never drained+closed → connection not returned to the pool | +| 3 | `service.go` `Totals` | concurrency | three **independent** calls awaited **sequentially**; could run concurrently (errgroup / goroutines+WaitGroup). Independence holds → safe to parallelize (must state the guard) | +| 4 | `inventory.go` `FindDuplicateSKUs` | algorithmic | **O(n²)** `contains` (slice membership) inside the loop; use a `map[string]struct{}` set | +| 5 | `inventory.go` `BuildLabels` | memory | `labels` appended from a nil slice with no `make([]T, 0, n)` preallocation → repeated reallocations | + +## Beyond-the-pack (floor-not-ceiling — bonus, not a recall requirement) + +| Location | Issue | Why it's beyond the pack | +|----------|-------|--------------------------| +| `inventory.go` `BuildLabels` | `fmt.Sprintf("%d", it.Price)` for int→string on a hot path | `fmt` is reflection-based; `strconv.Itoa` is ~10× faster. No Go-pack bullet names fmt.Sprintf-for-int-conversion — the agent must reason it. Finding it rewards out-reasoning; missing it is not a recall miss, but consistent misses ⇒ checklist-drift signal. | + +## Decoy (should NOT be flagged) + +| Location | Why it must be ignored | +|----------|------------------------| +| `inventory.go` `IsSupportedRegion` | `contains` over `defaultRegions` mirrors the #4 O(n²) pattern, BUT it's a constant 3-element config slice and a single membership test (not a request-loop). O(3) is cold/bounded → not a finding. Recommending "use a map" here is a precision/checklist failure. | + +## Scoring + +- **Recall** = (# of {1..5} found) / 5. #3 must include the independence/correctness guard. +- **Precision** = `IsSupportedRegion` decoy not flagged (or explicitly considered + rejected on + bounded-n grounds); zero fabricated findings. +- **Beyond-the-pack** = `fmt.Sprintf` flagged → bonus signal that the agent out-reasons the lens. + +## How to run + +Dispatch lane subagents (algorithmic, memory, data-access, concurrency) with the shared preamble + +that lane body from `../../lane-prompts.md`, the `go.md` lane slice + Runtime & GC notes + the two +modules, and this directory as scope. Score against the tables above. + +## Last run + +**2026-06-04, Sonnet — GREEN.** Recall 5/5; beyond-the-pack (`fmt.Sprintf` int→string) found and +explicitly flagged as not-in-the-pack; `IsSupportedRegion` decoy rejected on bounded-n grounds; the +2-operand string concat correctly rejected; zero fabrications. **Valid extra finding:** the agent also +flagged `QueryRow` without `r.Context()` (uncancellable DB work on client disconnect) — a real issue +not in the planted set; a legitimate beyond-the-rubric find, not a false positive. diff --git a/.claude/skills/performance-audit/test-fixtures/go-sample/inventory.go b/.claude/skills/performance-audit/test-fixtures/go-sample/inventory.go new file mode 100644 index 00000000..4f04ae91 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/go-sample/inventory.go @@ -0,0 +1,80 @@ +package shop + +import "fmt" + +type Item struct { + ID string + Name string + Price int +} + +type Quote struct { + Total int +} + +type Totals struct { + Revenue int + Tax int + Shipping int +} + +// FindDuplicateSKUs returns SKUs that appear more than once. +// +// PLANTED #4 (algorithmic): membership test against a SLICE (`contains`) inside +// the loop is O(n) per check → O(n^2) overall. Use a map[string]struct{} set. +// Request-sized input on a hot path. +func FindDuplicateSKUs(skus []string) []string { + var seen []string + var dupes []string + for _, sku := range skus { + if contains(seen, sku) { // O(n) linear scan inside the loop + dupes = append(dupes, sku) + } else { + seen = append(seen, sku) + } + } + return dupes +} + +func contains(xs []string, x string) bool { + for _, v := range xs { + if v == x { + return true + } + } + return false +} + +// BuildLabels formats a label per item. +// +// PLANTED #5 (memory): `labels` grows by append from a nil slice with no +// preallocation — repeated doublings + copies. `make([]string, 0, len(items))` +// pre-sizes it. +// +// BEYOND-THE-PACK (floor-not-ceiling): `fmt.Sprintf("%d", it.Price)` to convert +// an int to a string on a hot path uses reflection and is ~an order of magnitude +// slower than `strconv.Itoa(it.Price)`. NO Go-pack bullet names fmt.Sprintf-for- +// int-conversion; the agent must know/reason that fmt is reflection-based here. +func BuildLabels(items []Item) []string { + var labels []string + for _, it := range items { + price := fmt.Sprintf("%d", it.Price) + labels = append(labels, it.Name+": "+price) + } + return labels +} + +// defaultRegions is a fixed 3-element config read once at startup. +var defaultRegions = []string{"us", "eu", "apac"} + +// IsSupportedRegion — DECOY: `contains` over a SLICE, which mirrors the O(n^2) +// pattern, BUT defaultRegions is a constant of 3 and this is a single membership +// test (not nested in a request loop). O(3) is not a finding; flagging "use a map" +// here is checklist-walking. +func IsSupportedRegion(region string) bool { + return contains(defaultRegions, region) +} + +func (s *Server) fetchRevenue(orderID string) (int, error) { return 0, nil } +func (s *Server) fetchTax(orderID string) (int, error) { return 0, nil } +func (s *Server) fetchShipping(orderID string) (int, error) { return 0, nil } diff --git a/.claude/skills/performance-audit/test-fixtures/go-sample/service.go b/.claude/skills/performance-audit/test-fixtures/go-sample/service.go new file mode 100644 index 00000000..bdeb2be2 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/go-sample/service.go @@ -0,0 +1,61 @@ +// Package shop is a Go fixture for the performance-audit evals: a small HTTP +// service exercising the core Go lanes + the net-http-servers and database-sql +// modules + Runtime notes. Illustrative (not built). See expected-findings.md +// (do NOT read it as the agent under test). +package shop + +import ( + "database/sql" + "encoding/json" + "net/http" +) + +type Server struct { + db *sql.DB +} + +// HandleOrder enriches an order's line items and returns them. +func (s *Server) HandleOrder(w http.ResponseWriter, r *http.Request) { + ids := r.URL.Query()["item"] + + // PLANTED #1 (data-access / N+1, module: database-sql): one query per item in + // a loop instead of one `WHERE id = ANY($1)` batch. Reached per request. + var items []Item + for _, id := range ids { + row := s.db.QueryRow("SELECT id, name, price FROM items WHERE id = $1", id) + var it Item + if err := row.Scan(&it.ID, &it.Name, &it.Price); err == nil { + items = append(items, it) + } + } + + // PLANTED #2 (data-access, module: net-http-servers): a fresh http.Client per + // request — no connection reuse / keep-alive; should be a shared client built + // once. Also the body is never drained+closed. + client := &http.Client{} + resp, _ := client.Get("http://pricing/quote?order=" + r.URL.Query().Get("order")) + var quote Quote + json.NewDecoder(resp.Body).Decode("e) + + json.NewEncoder(w).Encode(map[string]any{"items": items, "quote": quote}) +} + +// Totals fetches three independent aggregates. PLANTED #3 (concurrency): the three +// calls are independent but awaited sequentially — latency is the sum. They could +// run concurrently (errgroup / goroutines + a WaitGroup). Independence holds: no +// shared mutable state, no ordering dependency. +func (s *Server) Totals(orderID string) (Totals, error) { + revenue, err := s.fetchRevenue(orderID) + if err != nil { + return Totals{}, err + } + tax, err := s.fetchTax(orderID) + if err != nil { + return Totals{}, err + } + ship, err := s.fetchShipping(orderID) + if err != nil { + return Totals{}, err + } + return Totals{revenue, tax, ship}, nil +} diff --git a/.claude/skills/performance-audit/test-fixtures/html-sample/expected-findings.md b/.claude/skills/performance-audit/test-fixtures/html-sample/expected-findings.md new file mode 100644 index 00000000..7efb9efe --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/html-sample/expected-findings.md @@ -0,0 +1,51 @@ +# Expected Findings — HTML fixture (companion pack + images-media + fonts) + +**Purpose:** exercise the **HTML companion pack** + the **`images-media`** and **`fonts`** modules + +the **Rendering path & Core Web Vitals** notes. Plain document; loads alongside whatever backend emits +it. + +**Pack slice to provide:** `html.md` lane slices (payload-startup heavy) + the **Rendering path & CWV** +notes + `html/images-media.md` + `html/fonts.md`. Scope = `index.html`. Do NOT let the agent read this +rubric. + +## Planted issues (should be found) + +| # | Location | Lane / module | Issue | +|---|----------|---------------|-------| +| 1 | `<head>` `<script src=analytics>` | payload-startup | parser-blocking third-party script in `<head>` (no `async`/`defer`) | +| 2 | `app.css` link + `@import theme.css` | payload-startup | render-blocking CSS + an `@import` waterfall (imported sheet discovered late); inline critical CSS, use top-level `<link>` | +| 3 | hero `<img loading="lazy">` | `images-media` | the **LCP image is lazy-loaded** (delays LCP) **and** has no `width`/`height` (→ CLS). Identify it as the LCP element | +| 4 | `@font-face` (no `font-display`) | `fonts` | default `block` → FOIT (invisible text ~3s); critical font not preloaded (late discovery) | +| 5 | hero `<img src="/img/hero-4000w.jpg">` | `images-media` | a fixed 4000px-wide image served to every viewport/DPR — no `srcset`/`sizes` (and a legacy format); 10–100× excess pixels on mobile | + +## Beyond-the-pack (floor-not-ceiling — bonus) + +| Location | Issue | Why beyond the pack | +|----------|-------|---------------------| +| `<img src="data:image/jpeg;base64,…">` | a full-res hero embedded as a base64 `data:` URI in the markup | The agent should reason about the *compounding* costs — bloats/blocks the HTML parse, the bytes can't be cached or `fetchpriority`-prioritized separately, it defeats the preload scanner, and it can't be a responsive `srcset` candidate. The memory lane names "big `data:` URIs"; the multi-faceted rendering-path reasoning is the bonus. | + +## Decoy (should NOT be flagged) + +| Location | Why ignored | +|----------|-------------| +| footer-promo `<img loading="lazy" width height>` | a below-the-fold thumbnail, correctly lazy-loaded **and** sized — this is the *right* use of `loading="lazy"`. Flagging it ("remove lazy-loading", "it causes CLS") is a precision/checklist failure (it has dimensions; no shift). | + +## Scoring + +- **Recall** = (# of {1..5} found) / 5. #3 should name *both* the lazy-LCP and the missing-dimensions + halves and identify the hero as the LCP candidate. +- **Precision** = the correctly-lazy-loaded sized footer image NOT flagged. +- **Beyond-the-pack** = the `data:` URI hero flagged with rendering-path reasoning. + +## How to run + +Dispatch payload-startup (+ a memory pass) subagents with the shared preamble + lane body from +`../../lane-prompts.md`, the `html.md` slices + Rendering-path notes + the two modules, and +`index.html` as scope. Score against the tables above. + +## Last run + +**2026-06-04, Sonnet — GREEN.** Recall 5/5 (#3 named both the lazy-LCP and missing-dimensions halves +and identified the hero as LCP); beyond-the-pack (data-URI hero) found with full multi-faceted +rendering-path reasoning; the correctly-lazy+sized footer decoy rejected; `fetchpriority`/`preconnect`/ +standalone-`<link>` candidates correctly subordinated; zero fabrications. diff --git a/.claude/skills/performance-audit/test-fixtures/html-sample/index.html b/.claude/skills/performance-audit/test-fixtures/html-sample/index.html new file mode 100644 index 00000000..40933d2f --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/html-sample/index.html @@ -0,0 +1,54 @@ +<!DOCTYPE html> +<!-- HTML fixture for the performance-audit evals: a plain document exercising the + HTML companion pack + the images-media and fonts modules + the Rendering-path + notes. See expected-findings.md (do NOT read it as the agent under test). --> +<html lang="en"> +<head> + <meta charset="utf-8"> + <title>Shop — Today's Deals + + + + + + + + + + + + + Deal of the day + +

Today's Deals

+ + + Newsletter + + + Featured + + diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/app.py b/.claude/skills/performance-audit/test-fixtures/python-sample/app.py new file mode 100644 index 00000000..7be9ea8e --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/app.py @@ -0,0 +1,42 @@ +"""Request orchestration — establishes the call topology for the service. + +This wires the existing modules into two request paths so the Execution Cost Map +(Lane 6) has a real structure to reason about. It introduces NO new performance +defects; it only makes the call topology explicit: + + handle_listing_request (per page view) + -> pricing.list_prices -> get_landed_cost -> _compute_landed_cost (heavy: 50k-iter loop) + -> inventory.find_duplicate_skus (O(n^2) over request-sized skus) + -> report.render_csv (per-row string build) + + handle_checkout_request (per checkout) + -> inventory.enrich_line_items (N+1 round-trips through repo.get) + -> report.total_revenue (per-row) + + config.load_enabled_flags is called ONCE at startup (cold path). +""" + +import config +import inventory +import pricing +import report + + +def handle_listing_request(raw_products): + """Hot path: render the product-listing page. raw_products is request-sized + (tens to a few hundred), each a dict with id, name, price, base, shipping, sku.""" + priced = pricing.list_prices(raw_products) # fan-out × heavy unit cost + dupes = inventory.find_duplicate_skus([p["sku"] for p in priced]) # O(n^2) + csv = report.render_csv(raw_products) # per-row string growth + return {"priced": priced, "dupes": dupes, "csv": csv} + + +def handle_checkout_request(order_item_ids): + """Hot path: finalize an order.""" + enriched = inventory.enrich_line_items(order_item_ids) # N+1 I/O round-trips + rows = [{"qty": 1, "price": e["price"]} for e in enriched] + return {"items": enriched, "revenue": report.total_revenue(rows)} + + +# Startup wiring — runs once when the process boots (cold path). +ENABLED_FLAGS = config.load_enabled_flags({"fast_export": True}) diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/benchmark.py b/.claude/skills/performance-audit/test-fixtures/python-sample/benchmark.py new file mode 100644 index 00000000..01d89dc6 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/benchmark.py @@ -0,0 +1,51 @@ +"""Representative workload driver — a REAL, runnable benchmark for the dynamic +profiling lane (Lane 8). + +This is the "existing benchmark / representative workload" that lets Lane 8 +activate honestly: it drives the two request paths in app.py at representative +sizes under cProfile, so the lane can report MEASURED hotspots instead of +guessing. + +Run: python benchmark.py +""" + +import cProfile +import io +import pstats +import random + +import app + +random.seed(0) # deterministic workload + + +def make_products(n): + return [ + { + "id": i, + "name": f"item-{i}", + "price": random.randint(1, 100), + "base": random.randint(1, 100), + "shipping": 5, + "sku": f"SKU-{i % (n // 2 or 1)}", # ~half are duplicate SKUs + } + for i in range(n) + ] + + +def workload(): + products = make_products(50) # representative listing size + for _ in range(20): # 20 listing requests + app.handle_listing_request(products) + for _ in range(20): # 20 checkout requests + app.handle_checkout_request(list(range(1, 31))) + + +if __name__ == "__main__": + pr = cProfile.Profile() + pr.enable() + workload() + pr.disable() + out = io.StringIO() + pstats.Stats(pr, stream=out).sort_stats("tottime").print_stats(12) + print(out.getvalue()) diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/config.py b/.claude/skills/performance-audit/test-fixtures/python-sample/config.py new file mode 100644 index 00000000..2172a946 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/config.py @@ -0,0 +1,24 @@ +"""Application config, loaded ONCE at startup. + +Contains the DECOY (see expected-findings.md): a tiny cold-path inefficiency a +well-calibrated audit should NOT flag. +""" + +# Fixed, tiny set of known feature flags — never grows with load. +_FLAGS = ["beta_ui", "fast_export", "new_pricing", "audit_log"] + + +def load_enabled_flags(env): + """Build the enabled-flag lookup once, at process startup. + + DECOY (should NOT be flagged): this sorts a fixed 4-element list and uses a + list membership check. It is O(n^2)-ish in theory, but n is a constant 4 and + this runs exactly once at startup — zero aggregate impact. A calibrated audit + treats this as NOT a finding (cold path, bounded tiny n). Flagging it is a + precision failure. + """ + enabled = [] + for flag in sorted(_FLAGS): + if flag not in enabled and env.get(flag): + enabled.append(flag) + return enabled diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/cost-map-expected.md b/.claude/skills/performance-audit/test-fixtures/python-sample/cost-map-expected.md new file mode 100644 index 00000000..895533ed --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/cost-map-expected.md @@ -0,0 +1,41 @@ +# Expected Execution Cost Map — Python fixture (Lane 6) + +**Purpose:** exercise **Lane 6 (Execution Cost Map)**, which is *descriptive*, not a findings list. +The check is qualitative — Lane 6 doesn't have recall/precision the way the defect lanes do. Score +it against the criteria below. `app.py` provides the call topology; `config.py` is the cold path. + +## How to run + +Dispatch a Lane 6 agent with the shared preamble + the **Lane 6 body** from `../../lane-prompts.md` +(note Lane 6's exemption from "report only problems" and the map output format), scope = the whole +`python-sample/` directory. Do NOT let it read `expected-findings.md` or this file. + +## What a good map looks like (pass criteria) + +**Format & discipline (these are the real test):** +- [ ] Output is a **MAP** (regions with a *basis* and a *confidence*), NOT a findings list with + Impact/Effort/Verification fields. +- [ ] Each region's basis is **structural** (loop nesting, fan-out, call-site count, request-path + membership) — NOT an invented absolute call count or fabricated millisecond figure. +- [ ] Regions carry a **confidence** label (High/Medium/Low). +- [ ] It does **not manufacture problems** — it is willing to describe inherent/fine regions and to + mark the cold path as cold rather than inventing an issue there. + +**Expected hot regions (the map should surface most of these):** + +| Region | Why it concentrates time | Expected confidence | +|--------|--------------------------|---------------------| +| `pricing._compute_landed_cost` via `list_prices`/`get_landed_cost` | **heavy unit cost** (50k-iteration loop) **× fan-out** (once per product) on the listing path — the dominant region | High | +| `inventory.enrich_line_items` | per-item **I/O round-trips** (N+1) on the checkout path — latency-bound | High/Medium | +| `inventory.find_duplicate_skus` | **O(n²)** over request-sized skus on the listing path | Medium | +| `report.render_csv` | **per-row** string growth on the listing path | Medium | + +**Cold region that must be characterized as cold (not a problem):** +| Region | Expected treatment | +|--------|--------------------| +| `config.load_enabled_flags` | runs **once at startup** over a fixed tiny list → negligible / cold. The map may mention it as cold; it must NOT present it as a hot region or a problem. | + +## Notes +- Lane 6 may note that `get_landed_cost`'s cache is defeated, but it should frame the *region* as + hot, not duplicate Lane 1's defect finding — the map's job is "where does time go," not "fix this." +- Bonus (not required): cross-referencing which mapped regions also have defect findings in other lanes. diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/expected-findings.md b/.claude/skills/performance-audit/test-fixtures/python-sample/expected-findings.md new file mode 100644 index 00000000..3d377832 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/expected-findings.md @@ -0,0 +1,65 @@ +# Expected Findings — Python golden fixture + +**Purpose:** a re-runnable validation harness for the `performance-audit` lanes. Dispatch the +relevant lane agents (Lanes 1–4) against this fixture and score: +- **Recall** — how many planted issues were found (target: all 6). +- **Precision** — was the decoy correctly *ignored* and were there few/no fabricated findings? + +This is stdlib-only and dependency-free. **Lane 5 (framework-idiom currency) is not exercised** by +this fixture (no framework → no currency brief); that's an honest coverage gap, not a fixture bug. +A JS/TS or Django fixture would exercise Lane 5. + +## Planted issues (should be found) + +| # | File:line | Lane | Issue | Why it's a real finding | +|---|-----------|------|-------|-------------------------| +| 1 | `inventory.py` `find_duplicate_skus` | 1 — Algorithmic | `in seen` against a **list** inside a loop → O(n²) | request-sized input on a hot path | +| 2 | `inventory.py` `enrich_line_items` | 3 — Data access | **N+1**: `repo.get()` per item; `repo.get_many()` exists | one round-trip per item on checkout path | +| 3 | `report.py` `total_revenue` | 2 — Memory | builds a full throwaway list just to `sum()` it | needless allocation proportional to input | +| 4 | `report.py` `render_csv` | 2/1 — Allocation | string `+=` in a loop → quadratic string growth | reallocation each iteration; `''.join` idiom | +| 5 | `report.py` `extract_codes` | 1 — Recomputed work | `re.compile()` **inside** the loop (loop-invariant) | recompiles per line; hoist to module level | +| 6 | `tasks.py` `load_dashboard` | 4 — Concurrency | sequential `await` of **independent** fetches | latency = sum of calls; `asyncio.gather` runs concurrently. Independence holds → safe to parallelize | + +## Decoy (should NOT be flagged) + +| File:line | Why it must be ignored | +|-----------|------------------------| +| `config.py` `load_enabled_flags` | O(n²)-ish list membership + sort, BUT n is a constant 4 and it runs once at startup. Zero aggregate impact → calibration says NOT a finding. Flagging it is a **precision failure**. | + +## Honeypot correctness bugs (boundary test for bug-no-chase) + +These test the rule: *a bug is in-scope to pursue ONLY when the incorrect behavior **is** the +performance problem; otherwise record it to the Suspected Bugs appendix and do not chase it.* + +| File | Bug | Perf-related? | Expected handling | +|------|-----|---------------|-------------------| +| `pricing.py` `get_landed_cost` (HONEYPOT A) | memo cache keyed by `id(product)`; `list_prices` builds a fresh dict per row, so the cache **never hits** and the expensive compute re-runs every call | **Yes — the bug IS the slowness** | **Pursue as a performance finding** (memoization defeated → recomputation on the hot path). Identifying the wrong cache key as the root cause is the point. | +| `pricing.py` `average_order_value` (HONEYPOT B) | divides by `len(orders) + 1` (off-by-one), understating the average | **No** | **Do NOT report as a perf finding.** If noticed, **record to the Suspected Bugs appendix and move on** (do not chase/fix). Reporting it as a perf finding, or fixing it, is a **boundary failure**. | + +**Scoring the honeypots — note the asymmetry (the audit is NOT a bug hunter):** +- **A is a recall item.** `get_landed_cost`'s never-hitting cache MUST be found — it is a + *performance* finding because the bug IS the slowness. **Missing it counts against recall.** +- **B is NOT a recall item.** The audit is not required to notice a pure correctness bug; **failing + to find `average_order_value`'s off-by-one is NOT counted against it.** What DOES count against it: + reporting B as a *performance* finding, or chasing/fixing it. Correct handling *if noticed* = one + line in the Suspected Bugs appendix, then move on. Finding-and-routing B correctly is a small + bonus, never a requirement. + +**General scoring principle (applies to every fixture):** recall is measured over performance +findings and *performance-related* bugs only. A missed pure-correctness bug is never a recall miss — +correctness hunting is `bug-hunt-cycle`'s job, not this audit's. A *performance-related* bug (one +whose incorrect behavior is the slowness, like Honeypot A) IS a recall item and missing it counts. +Mishandling a correctness bug (flagging it as perf, or chasing it) is always a failure. + +## Scoring + +- **Recall** = (# of {1..6} found) / 6. +- **Precision red flag** = decoy flagged as a real finding, or fabricated findings with no basis. +- A well-calibrated run finds 1, 2, 3, 4, 5, 6 and stays silent on the decoy (or explicitly notes + it considered and rejected the decoy on cold-path/bounded-n grounds). + +## How to re-run (sketch) + +Dispatch one subagent per lane (1, 2, 3, 4) with: the shared preamble + that lane body from +`../../lane-prompts.md`, the `../../profile-packs/python.md` slice for that lane, and the path to +this fixture directory. Collect findings; compare against the table above. diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/inventory.py b/.claude/skills/performance-audit/test-fixtures/python-sample/inventory.py new file mode 100644 index 00000000..8921cf07 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/inventory.py @@ -0,0 +1,38 @@ +"""Inventory operations. Called on the order-processing hot path. + +Contains two planted performance issues (see test-fixtures/.../expected-findings.md). +""" + +import repo + + +def find_duplicate_skus(skus): + """Return SKUs that appear more than once. + + PLANTED ISSUE #1 (Lane 1 — algorithmic): membership test `in seen` against a + LIST inside the loop is O(n) per check → O(n^2) overall. `seen` should be a set. + Reached per request with request-sized `skus`. + """ + seen = [] + dupes = [] + for sku in skus: + if sku in seen: # O(n) scan of a list, inside a loop + dupes.append(sku) + else: + seen.append(sku) + return dupes + + +def enrich_line_items(order_item_ids): + """Attach catalog data to each line item in an order. + + PLANTED ISSUE #2 (Lane 3 — data access / N+1): one repo.get() call per item + inside the loop. repo.get_many() can fetch the whole batch in a single + round-trip. Reached per order on the checkout path. + """ + enriched = [] + for item_id in order_item_ids: + row = repo.get(item_id) # N+1: one round-trip per item + if row: + enriched.append({"id": item_id, "name": row["name"], "price": row["price"]}) + return enriched diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/lane8-expected.md b/.claude/skills/performance-audit/test-fixtures/python-sample/lane8-expected.md new file mode 100644 index 00000000..2b2597ba --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/lane8-expected.md @@ -0,0 +1,36 @@ +# Expected behavior — Lane 8 (dynamic profiling & benchmarking) + +Lane 8 is **optional** and activates ONLY when the environment can build+run AND a real workload +exists. It MUST NOT invent load or fabricate numbers. Two behaviors are tested. + +## 8a — Genuine run (this fixture IS runnable) + +`benchmark.py` is a real, deterministic workload driver (cProfile over the two request paths in +`app.py`). A Lane 8 agent given this fixture SHOULD actually run it and report **measured** hotspots. + +**Pass criteria:** +- [ ] It actually executes the benchmark (e.g., `python benchmark.py`) rather than guessing. +- [ ] It reports the **measured** top hotspots with real numbers from the run (Confidence = Measured). +- [ ] It validates/refutes the static lanes against the measurement. + +**What the measurement actually shows** (reference — the agent should land near this): +- The **N+1 I/O dominates**: `time.sleep` inside `repo.get` (~0.67s of ~0.88s), reached via + `inventory.enrich_line_items`. This **confirms** the Lane 3 N+1 finding as the #1 *measured* cost. +- `pricing._compute_landed_cost` is **secondary** (~0.20s) and — notably — ran only ~50 times, not + ~1000. This **partly refutes** the static cost-map's "#1 dominant compute / cache never hits" + guess: in this tight workload, freed dict addresses are reused by CPython, so the `id()`-keyed + cache *accidentally* hits. (The cache remains fragile — a real service holding request objects + longer would see far worse — but the *measured* reality here is milder than static analysis.) + +The valuable Lane 8 output is exactly this **static-vs-dynamic divergence**: measurement reorders the +hotspots (I/O over compute) and tempers the cache claim. An agent that simply parrots the static map +without noting what the numbers actually say has under-used the lane. (It is fine and expected for +the dynamic ranking to differ from the static cost map — measurement supersedes guesses.) + +## 8b — Honest decline (no runnable workload) + +A Lane 8 agent pointed at the **React** fixture (`../react-sample/`) in an environment with no JS +build/run and no JS workload MUST follow the activation discipline: output +`Dynamic lane not run: ` (no build/runnable workload available) and **NOT fabricate** any +measurements. Inventing benchmark numbers, or running an unrelated/meaningless micro-benchmark, is a +failure. diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/pricing.py b/.claude/skills/performance-audit/test-fixtures/python-sample/pricing.py new file mode 100644 index 00000000..6c563054 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/pricing.py @@ -0,0 +1,59 @@ +"""Landed-cost pricing, called on the product-listing hot path. + +Contains two HONEYPOT correctness bugs (see expected-findings.md) that test the +audit's bug-handling boundary: + - one whose incorrect behavior IS the performance problem (should be pursued + as a finding), + - one with no performance implication (should be recorded to Suspected Bugs + and NOT chased). +""" + +_LANDED_COST_CACHE = {} + + +def _compute_landed_cost(product): + """Genuinely expensive: simulates a heavy per-product calculation.""" + total = 0.0 + for _ in range(50000): + total += product["base"] * 1.05 + return product["base"] * 1.2 + product["shipping"] + + +def get_landed_cost(product): + """Memoized landed-cost lookup. + + HONEYPOT A (perf-related correctness bug): the memo cache is keyed by + `id(product)` — object identity. Because `list_prices` below builds a FRESH + dict per product per request, the key never repeats: the cache NEVER hits and + `_compute_landed_cost` re-runs on every single call. The wrong-key bug IS the + performance problem (the optimization is silently defeated), so a performance + lane SHOULD pursue it as a finding — not merely record it and move on. + """ + key = id(product) # bug: identity key never repeats across requests + if key in _LANDED_COST_CACHE: + return _LANDED_COST_CACHE[key] + cost = _compute_landed_cost(product) + _LANDED_COST_CACHE[key] = cost + return cost + + +def list_prices(raw_products): + """Hot path: price every product in a listing.""" + out = [] + for r in raw_products: + product = {"base": r["base"], "shipping": r["shipping"], "sku": r["sku"]} # fresh dict each row + out.append({"sku": r["sku"], "landed": get_landed_cost(product)}) + return out + + +def average_order_value(orders): + """Average order amount. + + HONEYPOT B (non-performance correctness bug): divides by `len(orders) + 1`, + an off-by-one that understates the average. This is a pure correctness error + with NO performance implication. A performance lane MUST NOT report it as a + perf finding; if it notices the bug, it records it in the Suspected Bugs + appendix and moves on (does not chase or fix it). + """ + total = sum(o["amount"] for o in orders) + return total / (len(orders) + 1) # bug: should be len(orders) diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/repo.py b/.claude/skills/performance-audit/test-fixtures/python-sample/repo.py new file mode 100644 index 00000000..83cd0819 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/repo.py @@ -0,0 +1,24 @@ +"""In-memory fake repository (stdlib only, no real DB). + +Simulates a data store with a per-call cost so that an N+1 access pattern is a +*real* performance problem in the fixture, not a contrived one. Both a +single-id getter and a batched getter exist, so a per-item loop calling get() +is genuinely avoidable. +""" + +import time + +# Pretend this is a table keyed by id. +_ROWS = {i: {"id": i, "name": f"item-{i}", "price": (i * 7) % 101} for i in range(1, 1001)} + + +def get(item_id): + """Fetch one row by id. Simulates per-query round-trip latency.""" + time.sleep(0.001) # one round-trip + return _ROWS.get(item_id) + + +def get_many(item_ids): + """Fetch many rows in a single batched round-trip. Prefer this in loops.""" + time.sleep(0.001) # ONE round-trip regardless of batch size + return {i: _ROWS[i] for i in item_ids if i in _ROWS} diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/report.py b/.claude/skills/performance-audit/test-fixtures/python-sample/report.py new file mode 100644 index 00000000..2c105e18 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/report.py @@ -0,0 +1,46 @@ +"""Reporting helpers, called per report-export request. + +Contains planted performance issues (see expected-findings.md). +""" + +import re + + +def total_revenue(rows): + """Sum revenue across rows. + + PLANTED ISSUE #3 (Lane 2 — memory/allocation): materializes a full list of + every line's revenue just to sum it once. A generator expression avoids + building the throwaway list. With large `rows` this allocates needlessly. + """ + line_revenues = [row["qty"] * row["price"] for row in rows] # full list, used once + return sum(line_revenues) + + +def render_csv(rows): + """Render rows to a CSV string. + + PLANTED ISSUE #4 (Lane 2/1 — allocation in hot loop): builds the output by + repeated string concatenation (`out += ...`), which reallocates the growing + string on every iteration. ''.join(...) over a list/generator is the idiom. + """ + out = "" + for row in rows: + out += f"{row['id']},{row['name']},{row['price']}\n" # quadratic string growth + return out + + +def extract_codes(lines): + """Pull product codes out of free-text lines. + + PLANTED ISSUE #5 (Lane 1 — recomputed work in loop): re.compile() is called + on every iteration. The compiled pattern is loop-invariant and should be + hoisted (or module-level). Reached per line of potentially large input. + """ + codes = [] + for line in lines: + pattern = re.compile(r"[A-Z]{3}-\d{4}") # recompiled every iteration + m = pattern.search(line) + if m: + codes.append(m.group(0)) + return codes diff --git a/.claude/skills/performance-audit/test-fixtures/python-sample/tasks.py b/.claude/skills/performance-audit/test-fixtures/python-sample/tasks.py new file mode 100644 index 00000000..4f9d7cb1 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/python-sample/tasks.py @@ -0,0 +1,26 @@ +"""Async fan-out work, called per dashboard load. + +Contains a planted concurrency issue (see expected-findings.md). +""" + +import asyncio + + +async def fetch_widget(widget_id): + """Fetch one widget's data from a (simulated) remote service.""" + await asyncio.sleep(0.05) # independent remote call + return {"id": widget_id, "value": widget_id * 2} + + +async def load_dashboard(widget_ids): + """Load every widget for the dashboard. + + PLANTED ISSUE #6 (Lane 4 — concurrency): the awaits run strictly + sequentially — total latency is the SUM of all calls. The fetches are + independent (no shared state, no ordering dependency), so asyncio.gather + would run them concurrently. Correctness guard: result set is unchanged. + """ + results = [] + for widget_id in widget_ids: + results.append(await fetch_widget(widget_id)) # serial await of independent work + return results diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/HeavyChart.jsx b/.claude/skills/performance-audit/test-fixtures/react-sample/HeavyChart.jsx new file mode 100644 index 00000000..d03097f9 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/HeavyChart.jsx @@ -0,0 +1,7 @@ +// A deliberately "heavy" component (imagine it pulls in a large charting dependency). +// Only used on the rarely-visited "report" route — a prime code-splitting candidate (see entry.jsx 7#4). +import React from "react"; + +export function HeavyChart({ series }) { + return
{series.length} points
; +} diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/Home.jsx b/.claude/skills/performance-audit/test-fixtures/react-sample/Home.jsx new file mode 100644 index 00000000..249c2619 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/Home.jsx @@ -0,0 +1,6 @@ +import React from "react"; + +// Lightweight default route — fine to ship in the initial bundle. +export function Home() { + return
Home
; +} diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/LegacyWidget.jsx b/.claude/skills/performance-audit/test-fixtures/react-sample/LegacyWidget.jsx new file mode 100644 index 00000000..7ed8ef91 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/LegacyWidget.jsx @@ -0,0 +1,17 @@ +import React from "react"; + +// PLANTED LANE 5 #A (deprecated lifecycle): `componentWillReceiveProps` is a legacy/unsafe +// lifecycle. The currency brief flags it (deprecated since React 16.3; only `UNSAFE_`-prefixed +// aliases remain) in favor of `getDerivedStateFromProps` or function components + hooks. The code +// works today — it's a stale, at-risk idiom identifiable only via the brief. +export class LegacyWidget extends React.Component { + state = { value: this.props.value }; + + componentWillReceiveProps(nextProps) { + this.setState({ value: nextProps.value }); + } + + render() { + return {this.state.value}; + } +} diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/ProductList.jsx b/.claude/skills/performance-audit/test-fixtures/react-sample/ProductList.jsx new file mode 100644 index 00000000..c4c4d0dd --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/ProductList.jsx @@ -0,0 +1,47 @@ +// Representative React (illustrative — NOT executed; no build/install needed). +// Exercises the JS/TS pack's React subsection + Lane 1/2/4 signals. +import React, { useState, useMemo } from "react"; +import { Row } from "./Row"; + +// Hot path: re-renders on every keystroke in the filter box. +export function ProductList({ products, categories }) { + const [query, setQuery] = useState(""); + + // DECOY (should NOT be flagged): this derivation is ALREADY correctly memoized with the right + // dependency. Flagging correctly-memoized code is a precision failure. + const total = useMemo(() => products.reduce((s, p) => s + p.price, 0), [products]); + + // PLANTED REACT-PERF #3 (Lane 2/1 — expensive work in render, not memoized): the full sort runs + // on every render, including keystrokes that only change `query`. Should be useMemo([products]). + const sorted = [...products].sort((a, b) => b.price - a.price); + + const rows = sorted + .filter((p) => p.name.includes(query)) + .map((p, i) => { + // PLANTED REACT-PERF #1 (Lane 1 — O(n^2) in render): linear scan per product, every render. + // Build a Map(id -> category) once instead. + const category = categories.find((c) => c.id === p.categoryId); + return ( + // PLANTED REACT-PERF #2 (Lane 1/React — unstable key): index as key in a list that is + // sorted/filtered (reorders) defeats reconciliation and risks state bugs + extra work. + // PLANTED REACT-PERF #4 (Lane 4/React — fresh inline object + function each render): + // a new `style` object and `onSelect` closure are created per render, defeating React.memo + // on , so every Row re-renders even when its data is unchanged. + console.log(p.id)} + /> + ); + }); + + return ( +
+
Total: ${total}
+ setQuery(e.target.value)} /> + {rows} +
+ ); +} diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/Rarely.jsx b/.claude/skills/performance-audit/test-fixtures/react-sample/Rarely.jsx new file mode 100644 index 00000000..5974d7b4 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/Rarely.jsx @@ -0,0 +1,7 @@ +import React from "react"; + +// Rarely-visited route. Already lazy-loaded via React.lazy in entry.jsx — this is the DECOY: +// it is correctly code-split, so Lane 7 must NOT flag it. +export default function Rarely() { + return
Rarely visited
; +} diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/Row.jsx b/.claude/skills/performance-audit/test-fixtures/react-sample/Row.jsx new file mode 100644 index 00000000..80489498 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/Row.jsx @@ -0,0 +1,12 @@ +import React from "react"; + +// Memoized so it should only re-render when its props change by reference. But ProductList passes a +// fresh `style` object and `onSelect` closure on every render (see PLANTED REACT-PERF #4), which +// defeats this memo entirely — every Row re-renders on every parent render. +export const Row = React.memo(function Row({ product, category, style, onSelect }) { + return ( +
+ {product.name} — {category?.name} — ${product.price} +
+ ); +}); diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/currency-brief.md b/.claude/skills/performance-audit/test-fixtures/react-sample/currency-brief.md new file mode 100644 index 00000000..5f44da9e --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/currency-brief.md @@ -0,0 +1,36 @@ +--- +schema_version: 1 +framework: react +ecosystem: npm +researched_against_version: 18.x +latest_known_at_research: 19.x +researched_on: 2026-06-03 +fallback_ttl_days: 180 +sources: + - https://react.dev/reference/react-dom/client/createRoot + - https://react.dev/reference/react/Component + - https://react.dev/blog +--- + +> HAND-AUTHORED for the Lane 5 React fixture test. In real use this file is produced by the +> currency-protocol research step; here it is the brief the workhorse would pass to Lane 5. + +## Superseded patterns (old → new) +- `ReactDOM.render(el, container)` → `createRoot(container).render(el)` (deprecated in React 18; the + legacy root opts out of concurrent features and automatic batching). +- Legacy lifecycles `componentWillReceiveProps` / `componentWillMount` / `componentWillUpdate` → + `getDerivedStateFromProps`, `componentDidUpdate`, or function components + hooks. Deprecated since + 16.3; only `UNSAFE_`-prefixed aliases remain. +- A fresh inline object/array/function passed as a prop to a `React.memo` child → stabilize with + `useMemo` / `useCallback` (or rely on the React 19 compiler if enabled). + +## New fast-path APIs (and the version that introduced them) +- React 18: `createRoot`, automatic batching, `useTransition` / `useDeferredValue` for non-urgent + updates, `useId`. +- React 19: the React Compiler (automatic memoization), the `use()` hook. + +## Changed defaults +- React 18 enables automatic batching of state updates outside event handlers by default. + +## Known perf regressions / fixes by version +- (none relevant to this fixture) diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/entry.jsx b/.claude/skills/performance-audit/test-fixtures/react-sample/entry.jsx new file mode 100644 index 00000000..89c96c55 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/entry.jsx @@ -0,0 +1,29 @@ +// Application entry — exercises Lane 7 (payload / startup / build). Illustrative; not built. +import React, { Suspense } from "react"; +import _ from "lodash"; // PLANTED 7#1 (whole-library import): pulls all of lodash into the bundle to + // use only `debounce`; defeats tree-shaking. Use `lodash/debounce` or `lodash-es`. +import moment from "moment"; // PLANTED 7#2 (heavy non-tree-shakeable dep): moment ships all locales and + // is not tree-shakeable; for one format call a lighter option (Intl / + // date-fns) cuts a large chunk of bundle weight. +import { HeavyChart } from "./HeavyChart"; // PLANTED 7#4 (eager import of a heavy, rarely-used component): + // HeavyChart is only rendered on the "report" route but is + // imported eagerly, so it ships in the initial bundle. Should be + // React.lazy(() => import("./HeavyChart")) + code-split. +import { Home } from "./Home"; + +// PLANTED 7#3 (expensive work at module top-level / startup): runs during initial module evaluation, +// blocking first paint and inflating startup cost — 100k iterations of date formatting at boot. +const PRECOMPUTED = _.range(0, 100000).map((n) => moment().add(n, "days").format("YYYY-MM-DD")); + +// DECOY (correctly code-split — must NOT be flagged): a rarely-used route is already lazy-loaded. +const Rarely = React.lazy(() => import("./Rarely")); + +export function App({ route }) { + const onResize = _.debounce(() => {}, 200); // only one lodash function is actually used — see 7#1 + return ( +
+ {route === "report" ? : } + {route === "rare" ? : null} +
+ ); +} diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/expected-findings.md b/.claude/skills/performance-audit/test-fixtures/react-sample/expected-findings.md new file mode 100644 index 00000000..0588dad5 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/expected-findings.md @@ -0,0 +1,42 @@ +# Expected Findings — React fixture + +**Purpose:** exercise (a) the **JS/TS pack's React subsection** via the render/memoization/key +signals, and (b) **Lane 5 (framework-idiom currency)** for React using `currency-brief.md`. +`*.jsx` are illustrative (not executed/built). + +## How to run + +- **React-perf lanes:** dispatch Lane 1, Lane 2, and Lane 4 agents with the shared preamble + that + lane body from `../../lane-prompts.md` and the **React subsection** of + `../../profile-packs/javascript-typescript.md` as the lens; scope = `ProductList.jsx` + `Row.jsx`. +- **Lane 5 with-brief:** Lane 5 agent + the contents of `currency-brief.md` as `[currency brief]`; + scope = `index.jsx` + `LegacyWidget.jsx` (+ ProductList for the inline-prop currency note). +- **Lane 5 offline:** same but `[currency brief]` = "unavailable — offline" → expect LOW confidence, + no fabricated version claims. + +Do NOT let the agents read this rubric. + +## Planted issues (should be found) + +| # | File:loc | Lane / lens | Issue | +|---|----------|-------------|-------| +| 1 | `ProductList.jsx` `.map` body | 1 / React | `categories.find()` inside `.map()` → O(n²) per render; build a Map once | +| 2 | `ProductList.jsx` `` | 1 / React | index as key in a reordering (sorted/filtered) list | +| 3 | `ProductList.jsx` `const sorted = [...].sort()` | 2 / React | expensive sort/derivation in render, unmemoized; re-runs on every keystroke | +| 4 | `ProductList.jsx` `style={{...}}` / `onSelect={() => ...}` → `Row` | 4 / React | fresh inline object + closure each render defeat `React.memo` on `` | +| A | `LegacyWidget.jsx` `componentWillReceiveProps` | 5 (currency) | deprecated lifecycle per brief → `getDerivedStateFromProps`/hooks | +| B | `index.jsx` `ReactDOM.render(...)` | 5 (currency) | deprecated API per brief → `createRoot(...).render(...)` | + +## Decoy (should NOT be flagged) + +| File:loc | Why ignored | +|----------|-------------| +| `ProductList.jsx` `const total = useMemo(...)` | already correctly memoized with the right dependency. Flagging it is a precision failure. | + +## Scoring + +- **React-perf recall** = (# of {1,2,3,4} found) / 4. +- **Lane 5 with-brief recall** = (# of {A,B} found) / 2, each citing the brief entry. +- **Precision** = the `useMemo` decoy is not flagged; no fabricated findings. +- **Offline Lane 5** = A/B (if mentioned) carry LOW confidence + "manual currency check"; no + confident version-specific claims invented without the brief. diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/index.jsx b/.claude/skills/performance-audit/test-fixtures/react-sample/index.jsx new file mode 100644 index 00000000..8e374612 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/index.jsx @@ -0,0 +1,11 @@ +import React from "react"; +import ReactDOM from "react-dom"; +import { ProductList } from "./ProductList"; + +// PLANTED LANE 5 #B (deprecated API): `ReactDOM.render` was deprecated in React 18 in favor of +// `createRoot(container).render(...)`. The legacy root opts out of concurrent features. The +// currency brief flags this; identifiable as stale only against the brief (works fine on React 17). +ReactDOM.render( + , + document.getElementById("root") +); diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/lane7-expected.md b/.claude/skills/performance-audit/test-fixtures/react-sample/lane7-expected.md new file mode 100644 index 00000000..78b1bd35 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/lane7-expected.md @@ -0,0 +1,33 @@ +# Expected Findings — React Lane 7 (payload / startup / build) + +**Purpose:** exercise **Lane 7 (payload / startup / build)** — conditional, runs because this is a +frontend stack. Scope: `entry.jsx`, `HeavyChart.jsx`, `Home.jsx`, `Rarely.jsx`, `package.json`. +`*.jsx` are illustrative (not built). + +## How to run + +Dispatch a Lane 7 agent with the shared preamble + Lane 7 body from `../../lane-prompts.md` and the +Lane 7 + bundle bullets of `../../profile-packs/javascript-typescript.md` as the lens; scope = this +directory (including `package.json`). Do NOT let it read `expected-findings.md` or this file. + +## Planted issues (should be found) + +| # | File:loc | Issue | +|---|----------|-------| +| 1 | `entry.jsx` `import _ from "lodash"` | whole-library import to use only `debounce` → defeats tree-shaking; use `lodash/debounce` or `lodash-es` | +| 2 | `entry.jsx` `import moment from "moment"` | heavy, non-tree-shakeable date lib for one format call → lighter alternative (Intl / date-fns) | +| 3 | `entry.jsx` `const PRECOMPUTED = ...` | expensive work (100k iterations) at module top-level → runs at startup, blocks first paint; defer/lazy | +| 4 | `entry.jsx` `import { HeavyChart }` | heavy component used only on the rare "report" route imported eagerly → `React.lazy` + code-split | + +## Decoy (should NOT be flagged) + +| File:loc | Why ignored | +|----------|-------------| +| `entry.jsx` `const Rarely = React.lazy(() => import("./Rarely"))` | already correctly code-split. Flagging it is a precision failure. | + +## Scoring + +- **Recall** = (# of {1,2,3,4} found) / 4. +- **Precision** = the already-lazy `Rarely` route is not flagged; no fabricated findings. +- Lane 7 reasoning should be structural (import shape, manifest deps, module-top-level work, route + usage) — it cannot measure real bundle bytes without a build, and should not invent specific KB figures. diff --git a/.claude/skills/performance-audit/test-fixtures/react-sample/package.json b/.claude/skills/performance-audit/test-fixtures/react-sample/package.json new file mode 100644 index 00000000..843773a9 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/react-sample/package.json @@ -0,0 +1,11 @@ +{ + "name": "react-sample", + "private": true, + "//": "Illustrative dependency manifest for the Lane 7 (payload/startup/build) fixture — not installed/built.", + "dependencies": { + "react": "^18.2.0", + "react-dom": "^18.2.0", + "lodash": "^4.17.21", + "moment": "^2.30.1" + } +} diff --git a/.claude/skills/performance-audit/test-fixtures/rust-sample/expected-findings.md b/.claude/skills/performance-audit/test-fixtures/rust-sample/expected-findings.md new file mode 100644 index 00000000..86eeef70 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/rust-sample/expected-findings.md @@ -0,0 +1,49 @@ +# Expected Findings — Rust fixture (core + web + async-tokio + database) + +**Purpose:** exercise the Rust core lanes + the `web`, `async-tokio`, and `database` modules + the +Runtime & build notes. Illustrative (not built). + +**Pack slice to provide:** `rust.md` lane slices + the **Runtime & build notes** section + (material) +`rust/web.md`, `rust/async-tokio.md`, `rust/database.md`. Do NOT let the agent read this rubric. + +## Planted issues (should be found) + +| # | Location | Lane / module | Issue | +|---|----------|---------------|-------| +| 1 | `handlers.rs` `AppState` (`#[derive(Clone)]`) | `web` | big owned state (`Vec` catalog) deep-cloned per request; hold heavy fields behind `Arc` (or `Arc`). `PgPool` clone is fine — don't flag that part | +| 2 | `handlers.rs` `order_handler` loop | `database` | **N+1**: one `fetch_one` per id; batch with `WHERE id = ANY($1)` | +| 3 | `handlers.rs` `record_metric` | `async-tokio` | `std::sync::Mutex` guard **held across `.await`** — stalls the executor thread; drop the guard before awaiting | +| 4 | `handlers.rs` `dashboard` | concurrency | two **independent** awaits run sequentially; `tokio::join!`. Must state the independence guard | +| 5 | `inventory.rs` `label_for` | memory | `name.clone()` where `tag_of` could take `&str` — needless allocation | + +## Beyond-the-pack (floor-not-ceiling — bonus, not required) + +| Location | Issue | Why beyond the pack | +|----------|-------|---------------------| +| `inventory.rs` `count_skus` | `contains_key` then `insert` (+ a later `get_mut`) hashes the key 2–3× per item | The **Entry API** (`*counts.entry(sku).or_insert(0) += 1`) hashes once. No Rust-pack bullet names the double-hash; requires knowing the Entry API. Found ⇒ out-reasoned the lens. | + +## Decoy (should NOT be flagged) + +| Location | Why ignored | +|----------|-------------| +| `inventory.rs` `boot_defaults` | a `.clone()` of small fixed `Settings`, run once at startup. Mirrors #5's clone pattern but is cold/bounded → not a finding. Flagging it is a precision/checklist failure. | + +## Scoring + +- **Recall** = (# of {1..5} found) / 5. #1 should target the heavy fields (not the `PgPool`); #4 must + include the independence guard. +- **Precision** = `boot_defaults` decoy not flagged; no fabricated findings. +- **Beyond-the-pack** = `count_skus` Entry-API double-hash flagged → out-reasons-the-lens bonus. + +## How to run + +Dispatch lane subagents (memory, data-access, concurrency) with the shared preamble + lane body from +`../../lane-prompts.md`, the `rust.md` lane slice + Runtime & build notes + the three modules, and this +directory as scope. Score against the tables above. + +## Last run + +**2026-06-04, Sonnet — GREEN.** Recall 5/5 (#1 correctly targeted the heavy fields and excluded +`PgPool`; #4 stated the independence guard); beyond-the-pack (`count_skus` Entry-API multi-hash) found +and flagged as not-in-the-pack; `boot_defaults` decoy rejected as the cold-path clone; the `Vec::with_capacity` +and hasher micro-opts correctly subordinated/rejected; zero fabrications. diff --git a/.claude/skills/performance-audit/test-fixtures/rust-sample/handlers.rs b/.claude/skills/performance-audit/test-fixtures/rust-sample/handlers.rs new file mode 100644 index 00000000..f51776d7 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/rust-sample/handlers.rs @@ -0,0 +1,61 @@ +//! Rust fixture for the performance-audit evals: an axum + tokio + sqlx service +//! exercising the core Rust lanes + the web, async-tokio, and database modules + +//! Runtime & build notes. Illustrative (not built). See expected-findings.md +//! (do NOT read it as the agent under test). + +use std::sync::{Arc, Mutex}; + +// PLANTED #1 (module: web): AppState derives Clone on a big owned struct, so every +// handler dispatch DEEP-COPIES the whole config + cache. It should hold its heavy +// fields behind `Arc` (clone = refcount bump), or the whole state be `Arc`. +#[derive(Clone)] +pub struct AppState { + pub config: Config, // large, owned + pub catalog: Vec, // thousands of entries, cloned per request + pub pool: sqlx::PgPool, // (PgPool clone is cheap — this one is fine) +} + +pub async fn order_handler(state: AppState, ids: Vec) -> Vec { + // PLANTED #2 (module: database): N+1 — one query per id in a loop instead of one + // `WHERE id = ANY($1)`. Each await is a round-trip. + let mut rows = Vec::new(); + for id in &ids { + let row = sqlx::query_as::<_, Row>("SELECT id, name FROM items WHERE id = $1") + .bind(id) + .fetch_one(&state.pool) + .await + .unwrap(); + rows.push(row); + } + rows +} + +// PLANTED #3 (module: async-tokio): a std::sync::Mutex guard held ACROSS an `.await` +// point — stalls the executor thread for the whole suspension, and risks deadlock. +// Scope/drop the guard before awaiting. +pub async fn record_metric(counter: Arc>, db: &sqlx::PgPool) { + let mut guard = counter.lock().unwrap(); + *guard += 1; + sqlx::query("INSERT INTO metrics(n) VALUES ($1)") + .bind(*guard as i64) + .execute(db) // .await while holding the std Mutex guard + .await + .unwrap(); +} + +// PLANTED #4 (core concurrency): two INDEPENDENT awaits run sequentially; latency is +// the sum. `tokio::join!(a, b)` runs them concurrently. Independence holds (distinct +// endpoints, no shared mutable state) — state the guard. +pub async fn dashboard(state: &AppState) -> (Summary, Summary) { + let revenue = fetch_revenue(&state.pool).await; + let refunds = fetch_refunds(&state.pool).await; + (revenue, refunds) +} + +pub struct Config; +#[derive(Clone)] +pub struct Product; +pub struct Row; +pub struct Summary; +async fn fetch_revenue(_p: &sqlx::PgPool) -> Summary { Summary } +async fn fetch_refunds(_p: &sqlx::PgPool) -> Summary { Summary } diff --git a/.claude/skills/performance-audit/test-fixtures/rust-sample/inventory.rs b/.claude/skills/performance-audit/test-fixtures/rust-sample/inventory.rs new file mode 100644 index 00000000..335a0bed --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/rust-sample/inventory.rs @@ -0,0 +1,45 @@ +//! Core-lane (algorithmic / memory) issues + a beyond-the-pack issue + a decoy. + +use std::collections::HashMap; + +// PLANTED #5 (core memory): `name.clone()` allocates a fresh String when a borrow +// (`&str`) would do — `tag_of` only reads it. Pass `&str`. +pub fn label_for(name: String) -> String { + let t = tag_of(name.clone()); // needless clone; tag_of could take &str + format!("{t}:{name}") +} + +fn tag_of(s: String) -> String { + s.chars().take(3).collect() +} + +// counts unique SKUs. +// +// BEYOND-THE-PACK (floor-not-ceiling): `contains_key` THEN `insert` hashes the key +// TWICE per new entry. The Entry API (`*counts.entry(sku).or_insert(0) += 1`) hashes +// once. NO Rust-pack bullet names the contains_key-then-insert double-hash — the +// agent must know/reason about the Entry API. Bonus if found. +pub fn count_skus(skus: &[String]) -> HashMap { + let mut counts: HashMap = HashMap::new(); + for sku in skus { + if !counts.contains_key(sku) { // hash #1 + counts.insert(sku.clone(), 0); // hash #2 (+ a clone) + } + *counts.get_mut(sku).unwrap() += 1; // hash #3 + } + counts +} + +// DECOY: a `.clone()` of the (small, fixed) default settings, run ONCE at process +// startup. It mirrors the "needless clone" pattern from #5, BUT it is on a cold, +// run-once path over a tiny value — zero aggregate impact. Flagging "avoid the +// clone" here is a precision/checklist failure (calibration: cold-path micro-nit). +pub fn boot_defaults(base: &Settings) -> Settings { + base.clone() +} + +#[derive(Clone)] +pub struct Settings { + pub region: String, + pub retries: u8, +} diff --git a/.claude/skills/performance-audit/test-fixtures/sql-sample/expected-findings.md b/.claude/skills/performance-audit/test-fixtures/sql-sample/expected-findings.md new file mode 100644 index 00000000..f04e2628 --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/sql-sample/expected-findings.md @@ -0,0 +1,55 @@ +# Expected Findings — SQL fixture (companion pack + PostgreSQL + Routines) + +**Purpose:** exercise the **SQL companion pack** (loads alongside a language pack) + the +**`sql/postgres.md`** dialect module + the **Routines discoverability** (the most expensive hand-SQL +lives inside a function body, invoked by name). PostgreSQL dialect; schema/DDL in scope. + +**Pack slice to provide:** `sql.md` lane slices + the **Reading the plan & schema** notes + the +**Routines** section + `sql/postgres.md`. Provide all three files (`schema.sql`, `queries.sql`, +`procs.sql`) as scope. Do NOT let the agent read this rubric. + +## Planted issues (should be found) + +| # | Location | Lane / area | Issue | +|---|----------|-------------|-------| +| 1 | `queries.sql` Q1 | data-access / sargability | `WHERE date(created_at) = $1` is non-sargable AND `created_at` is unindexed; rewrite as a half-open range + add an index | +| 2 | `queries.sql` Q2 | data-access / missing index | filtered by `c.email` (indexed) then joins to orders on `orders.customer_id`, which has **no index** (schema confirms) → sequential scan of `orders`; add an index on `orders.customer_id`. `SELECT *` also over-fetches | +| 3 | `queries.sql` Q3 | memory / pagination | deep `OFFSET 100000` scans+discards; use keyset/seek pagination | +| 4 | `procs.sql` `enrich_recent_orders` | algorithmic / **Routines** | **RBAR inside a routine** — per-row query in a `LOOP`; replace with one set-based `UPDATE … FROM`. **Found only by following the `SELECT enrich_recent_orders()` call from queries.sql into the body** | +| 5 | `procs.sql` `trg_bump_order_count` | `sql/postgres.md` (triggers) | `FOR EACH ROW` trigger writes per inserted row → bulk insert becomes N writes; statement-level trigger / transition table | + +## Beyond-the-pack / the discoverability signal + +**#4 is the headline test of the Routines feature**: a top-level-only audit of `queries.sql` will NOT +find it. Recall credit for #4 requires the agent to **treat `SELECT enrich_recent_orders()` as a +pointer into `procs.sql` and audit the body**. Missing #4 while finding 1–3 is the precise failure the +Routines section was written to prevent — call it out in scoring. + +## Decoy (should NOT be flagged) + +| Location | Why ignored | +|----------|-------------| +| `queries.sql` final query | `SELECT id, email FROM customers WHERE id = $1` is a primary-key seek returning one row with named columns — already optimal. "Add an index / avoid the scan" here is a precision failure. | + +## Scoring + +- **Recall** = (# of {1..5} found) / 5. **#4 only counts if the agent actually inspected the routine + body** (not a generic "review your stored procedures" hand-wave). +- **Precision** = the PK-seek decoy not flagged; no fabricated index recommendations on already-indexed + or trivially-bounded queries. +- **Routines discoverability** = did the agent follow the routine invocation into its definition? This + is the fixture's distinguishing signal. + +## How to run + +Dispatch the relevant lane subagents (data-access, memory, algorithmic) with the shared preamble + +lane body from `../../lane-prompts.md`, the `sql.md` slices + Reading-the-plan + Routines notes + +`sql/postgres.md`, and **all three `.sql` files** as scope. Score against the tables above. + +## Last run + +**2026-06-04, Sonnet — GREEN (re-run after the Q2 fix).** Recall 5/5: the **Routines discoverability +held** — the agent followed `SELECT enrich_recent_orders()` into `procs.sql` and flagged the RBAR loop, +explicitly noting it is "only reachable by following the call into the function body." #2 now lands the +missing `orders.customer_id` index (email-driven query makes it bite). PK-seek decoy + VOLATILE + UUID + +`idx_orders_status` candidates all correctly rejected; zero fabrications. diff --git a/.claude/skills/performance-audit/test-fixtures/sql-sample/procs.sql b/.claude/skills/performance-audit/test-fixtures/sql-sample/procs.sql new file mode 100644 index 00000000..30f1f5ee --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/sql-sample/procs.sql @@ -0,0 +1,37 @@ +-- Routine definitions. THIS is where the most expensive hand-rolled SQL hides: the +-- application invokes enrich_recent_orders() by name (see queries.sql); an audit +-- that doesn't follow the call into this body never sees the per-row loop. + +-- PLANTED #4 (the discoverability finding — RBAR / N+1 inside a routine body): +-- a PL/pgSQL function that loops over recent orders and runs one query PER ROW. +-- This is set-based work expressed row-by-row — a single UPDATE ... FROM (a joined +-- aggregate) would replace the loop. Found ONLY if the auditor follows the call +-- from queries.sql into this definition. +CREATE OR REPLACE FUNCTION enrich_recent_orders() RETURNS void AS $$ +DECLARE + o RECORD; + item_total bigint; +BEGIN + FOR o IN SELECT id FROM orders WHERE status = 'paid' LOOP + -- one round-trip per order, in a loop: + SELECT sum(qty) INTO item_total FROM order_items WHERE order_id = o.id; + UPDATE orders SET total_cents = item_total * 100 WHERE id = o.id; + END LOOP; +END; +$$ LANGUAGE plpgsql VOLATILE; -- (volatility: fine here; it mutates) + +-- PLANTED #5 (postgres module — row-level trigger doing per-row work on bulk DML): +-- a FOR EACH ROW trigger that fires a write on EVERY inserted order_item, so a bulk +-- insert of N items becomes N trigger invocations + N writes. A statement-level +-- trigger over the transition table (or a constraint/materialized count) avoids the +-- per-row tax. +CREATE OR REPLACE FUNCTION bump_order_count() RETURNS trigger AS $$ +BEGIN + UPDATE orders SET total_cents = total_cents WHERE id = NEW.order_id; -- touch per row + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER trg_bump_order_count + AFTER INSERT ON order_items + FOR EACH ROW EXECUTE FUNCTION bump_order_count(); diff --git a/.claude/skills/performance-audit/test-fixtures/sql-sample/queries.sql b/.claude/skills/performance-audit/test-fixtures/sql-sample/queries.sql new file mode 100644 index 00000000..99a2684f --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/sql-sample/queries.sql @@ -0,0 +1,37 @@ +-- Hand-rolled queries invoked from the application. NOTE: the application also calls +-- the stored function `enrich_recent_orders()` (defined in procs.sql) by name — the +-- expensive SQL lives in THAT body, not here. An audit that reads only these +-- top-level queries misses it (see the Routines section of sql.md). + +-- PLANTED #1 (data-access / sargability): a function on the indexed... actually +-- created_at is UNindexed AND wrapped in date() — non-sargable AND no supporting +-- index. The predicate `date(created_at) = $1` cannot use an index even if one +-- existed; rewrite as a half-open range `created_at >= $1 AND created_at < $1 + 1` +-- and add an index on created_at. +SELECT * FROM orders +WHERE date(created_at) = $1; + +-- PLANTED #2 (data-access / missing index + over-fetch): fetch all orders for a +-- customer looked up by email. The planner finds the customer via idx_customers_email +-- (fast), then must find orders WHERE customer_id = — but orders.customer_id has +-- NO index (schema confirms), so this is a sequential scan of orders per lookup. Add +-- an index on orders.customer_id. SELECT * also over-fetches every column. +SELECT * +FROM orders o +JOIN customers c ON c.id = o.customer_id +WHERE c.email = $1; + +-- PLANTED #3 (memory / pagination): deep OFFSET pagination scans and discards +-- 100000 rows every page. Use keyset/seek pagination anchored on (created_at, id). +SELECT id, total_cents, created_at +FROM orders +ORDER BY created_at DESC +OFFSET 100000 LIMIT 20; + +-- The application then calls the routine (its body is the real hot spot): +SELECT enrich_recent_orders(); + +-- DECOY (should NOT be flagged): a lookup by the PRIMARY KEY — already an index +-- seek, returns one row, named columns. Nothing to optimize. Flagging it (e.g. +-- "add an index", "avoid the scan") is a precision failure. +SELECT id, email FROM customers WHERE id = $1; diff --git a/.claude/skills/performance-audit/test-fixtures/sql-sample/schema.sql b/.claude/skills/performance-audit/test-fixtures/sql-sample/schema.sql new file mode 100644 index 00000000..e0dda51c --- /dev/null +++ b/.claude/skills/performance-audit/test-fixtures/sql-sample/schema.sql @@ -0,0 +1,28 @@ +-- SQL fixture for the performance-audit evals (PostgreSQL dialect). The schema/DDL +-- is in scope so the auditor can reason about indexes and types. See +-- expected-findings.md (do NOT read it as the agent under test). + +CREATE TABLE customers ( + id bigserial PRIMARY KEY, + email varchar(255) NOT NULL, + created_at timestamptz NOT NULL DEFAULT now() +); +CREATE UNIQUE INDEX idx_customers_email ON customers (email); + +CREATE TABLE orders ( + id bigserial PRIMARY KEY, + customer_id bigint NOT NULL REFERENCES customers(id), + status varchar(20) NOT NULL, + total_cents bigint NOT NULL, + created_at timestamptz NOT NULL DEFAULT now() +); +-- NOTE: there is NO index on orders.customer_id, and NONE on orders.created_at. +CREATE INDEX idx_orders_status ON orders (status); + +CREATE TABLE order_items ( + id bigserial PRIMARY KEY, + order_id bigint NOT NULL REFERENCES orders(id), + sku varchar(64) NOT NULL, + qty int NOT NULL +); +CREATE INDEX idx_order_items_order_id ON order_items (order_id); diff --git a/.claude/skills/performance-audit/version-indexes/README.md b/.claude/skills/performance-audit/version-indexes/README.md new file mode 100644 index 00000000..ff6bf56b --- /dev/null +++ b/.claude/skills/performance-audit/version-indexes/README.md @@ -0,0 +1,66 @@ +# Version Perf Indexes (shipped, build-once lookup) + +**What this is:** curated, committed lookups of *version-specific* performance features/APIs per +ecosystem — "this API/type, as of this version, is the fast path." They are **built once** (mining +rich sources like .NET "What's New" / "Performance Improvements in .NET N" posts) and committed, so a +performance audit **looks them up cheaply at runtime instead of re-researching the whole version +history on every run**. + +This is the middle tier of a three-tier knowledge model: + +1. **Profile pack** (`../profile-packs/.md`) — durable, version-independent idioms (the lens). +2. **Version perf index** (this directory) — curated version-specific perf features, build-once. +3. **Live currency brief** (`docs/perf-audits/cache/…`, per `../currency-protocol.md`) — fills only + the gap *beyond* an index's `covered_through` version, so live web research is the exception. + +The `idiom-currency` lane consults the shipped index **first** (no network). Live research runs only +to extend past `covered_through` (or when no index exists for the ecosystem). + +## Schema (`index_schema_version: 1`) + +One file per ecosystem: `version-indexes/.md`. + +```markdown +--- +index_schema_version: 1 +ecosystem: +covered_through: "" +built_on: +sources: + - # the pages this index was mined from +--- +# performance version index +> Build-once lookup. The idiom-currency lane consults this first; live research only extends past +> `covered_through`. + +## +- **** — landed/major-perf-improved in **** — — supersedes — use when . +``` + +## Curation rules (avoid overload — same spirit as the packs) +- **Curated, not exhaustive.** Only entries with a *material* perf benefit a code reviewer would act + on. Skip micro-deltas and internal-only improvements with no API surface. +- **Lookup-shaped.** Each entry is keyed by the API/type/feature so the lane can match code against it + ("the code parses JSON with reflection-based `JsonSerializer`; the index says source-gen is the fast + path as of .NET 6+"). One line of guidance per entry. +- **Version is data, not prose.** Put the version in the entry's `version` field/clause, not woven + into long paragraphs. +- **Group by area** so a lane can scan the relevant section. +- **`covered_through` is the contract** with live research: everything up to it is the index's job; + everything after is the live brief's job. +- **Note the support cadence (LTS/STS) where the ecosystem has one.** Ecosystems with a long-term-support + track — **.NET** (even majors = LTS / 3 yr, odd = STS / 18 mo), **Java** (LTS: 8/11/17/21/25, ~2-yr + cadence), **Node.js** (even majors = LTS) — SHOULD carry a near-top `## Support cadence` section. + This exists because **"upgrade to the latest major for feature X" is often invalid advice**: a project + pinned to an LTS line cannot adopt an STS-only feature without leaving support. Upgrade-opportunity + guidance MUST respect the project's support track — prefer the newest feature available *on its LTS + line*, or explicitly flag the support-track tradeoff. (The idiom-currency lane enforces this; see + `lane-prompts.md`.) + +## How to add / refresh an index +1. Mine the ecosystem's authoritative version-history perf sources **once** (url-to-markdown for rich + pages; scan/grep, don't read end-to-end). +2. Distill into curated entries per the schema; set `covered_through`, `built_on`, `sources`. +3. Commit. Refresh when a new major version ships enough perf-relevant surface to matter (bump + `covered_through` + `built_on`). diff --git a/.claude/skills/performance-audit/version-indexes/dotnet.md b/.claude/skills/performance-audit/version-indexes/dotnet.md new file mode 100644 index 00000000..f8527d4c --- /dev/null +++ b/.claude/skills/performance-audit/version-indexes/dotnet.md @@ -0,0 +1,241 @@ +--- +index_schema_version: 1 +ecosystem: dotnet +covered_through: ".NET 10 LTS / EF Core 10 / ASP.NET Core 10 (.NET 11 preview entries included, not GA)" +built_on: 2026-06-03 +sources: + - https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/ + - https://devblogs.microsoft.com/dotnet/announcing-dotnet-7/ + - https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-8/ + - https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-9/ + - https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-10/ + - https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-8/overview + - https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-9/overview + - https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-10/overview + - https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-11/overview + - https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/source-generation + - https://learn.microsoft.com/en-us/ef/core/performance/efficient-querying + - https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-7.0/whatsnew + - https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-8.0/whatsnew + - https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-9.0/whatsnew + - https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-10.0/whatsnew + - https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-12 + - https://learn.microsoft.com/en-us/aspnet/core/release-notes/aspnetcore-6.0 + - https://learn.microsoft.com/en-us/aspnet/core/release-notes/aspnetcore-7.0 + - https://learn.microsoft.com/en-us/aspnet/core/release-notes/aspnetcore-8.0 + - https://learn.microsoft.com/en-us/aspnet/core/release-notes/aspnetcore-9.0 + - https://learn.microsoft.com/en-us/aspnet/core/release-notes/aspnetcore-10.0 + - https://learn.microsoft.com/en-us/aspnet/core/performance/caching/output + - https://learn.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel/http3 + - https://learn.microsoft.com/en-us/aspnet/core/blazor/components/virtualization + - https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/workstation-server-gc + - https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcallowverylargeobjects-element + - https://learn.microsoft.com/en-us/dotnet/api/system.runtime.gcsettings.largeobjectheapcompactionmode + - https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/uselegacyjit-element + - https://learn.microsoft.com/en-us/dotnet/api/system.net.servicepointmanager.defaultconnectionlimit + - https://learn.microsoft.com/en-us/dotnet/framework/migration-guide/application-compatibility + - https://learn.microsoft.com/en-us/dotnet/fundamentals/runtime-libraries/system-xml-serialization-xmlserializer + - https://learn.microsoft.com/en-us/dotnet/framework/network-programming/tls +--- +# .NET performance version index +> Build-once lookup. The idiom-currency lane consults this first; live research only extends past +> `covered_through`. + +## Support cadence (LTS / STS) + +.NET follows a strict alternating support cadence: + +| Version | Type | Support window | Status | +|---------|------|----------------|--------| +| .NET 8 | LTS | 3 years (to Nov 2026) | Maintenance | +| .NET 9 | STS | 18 months (to May 2026) | End of life | +| .NET 10 | LTS | 3 years (to Nov 2028) | **Current LTS** | +| .NET 11 | STS | 18 months (~May 2028, est.) | Preview (GA Nov 2026) | + +**Even-numbered majors = LTS (3-year support). Odd-numbered majors = STS (18-month support).** + +**Operative rule for upgrade-opportunity findings:** Recommending a feature that requires an STS release is often invalid for a project pinned to the LTS track (common in enterprise, government, and heavily regulated environments). When reviewing code, prefer the latest feature available on the project's LTS line. If the best recommendation requires an STS release, explicitly flag the support-track tradeoff: state the feature, the version it requires, and that the target project must accept STS support terms to adopt it. + +## .NET Framework (4.x timeline) + +> **In-Framework upgrade opportunities, not a platform migration.** The project is already on some +> 4.x (commonly 4.8); this area flags code that uses a *pre-4.5-era* pattern where a feature has been +> available since **4.Y** within the same Framework line. Use it when the runtime is 4.8 but the code +> still looks like 3.5/4.0. Entries marked **NuGet** are out-of-band backport packages (not in-box). +> `covered_through` (modern .NET) does **not** apply here — these are Framework-only facts. + +- **`async`/`await` + Task-based Asynchronous Pattern (TAP)** — **4.5** — frees thread-pool threads on I/O-bound work; non-blocking — supersedes APM (`Begin*`/`End*`), `ThreadPool.QueueUserWorkItem`, raw `Thread` — use when migrating blocking I/O or callback-based async to `await`. +- **`HttpClient`** — **4.5** — connection-pooling HTTP client; reuse a single long-lived instance — supersedes `WebClient` / `HttpWebRequest` per-call — use for outbound HTTP; set `ServicePoint.ConnectionLeaseTimeout` to avoid stale DNS, and raise `ServicePointManager.DefaultConnectionLimit` (default 2 non-web). +- **`IReadOnlyList` / `IReadOnlyCollection` / `IReadOnlyDictionary`** — **4.5** — read-only collection interfaces; expose collections without defensive copies — supersedes returning `List`/arrays or wrapping in `ReadOnlyCollection` only — use on API surfaces that should not allocate a copy to be safe. +- **`` (Server GC)** — config available since 4.0, background Server GC since **4.5** — per-CPU heaps + dedicated GC threads; higher throughput, lower pause on multi-core servers — supersedes default Workstation GC for server workloads — use when a multi-core service process uses Workstation GC (the non-ASP.NET default). +- **``** — **4.5** — allows arrays >2 GB total on 64-bit — supersedes the 2 GB array-size ceiling — use when large in-memory arrays on x64 are needed (verify no unsafe code assumes <2 GB arrays). +- **`GCSettings.LargeObjectHeapCompactionMode`** — **4.5.1** — `CompactOnce` compacts the LOH on the next full blocking GC, reclaiming fragmentation — supersedes sweep-only LOH (fragmentation accrues) — use in apps that churn large transient buffers and show LOH fragmentation. +- **`System.Memory` — `Span` / `Memory` / `ReadOnlySpan`** — **NuGet**, targets 4.5+ — slice arrays/strings without copying (portable "slow span"; **no runtime fast-path intrinsics**; ref-struct language features need C# 7.2+) — supersedes `ArraySegment` + offset/length arithmetic — use for buffer/slice processing; mark as NuGet backport. +- **`System.Buffers` — `ArrayPool.Shared`** — **NuGet**, targets 4.5.1+ — pools temporary arrays to cut GC/LOH pressure — supersedes `new T[n]` for transient buffers on hot paths — use for I/O buffers; `Rent`/`Return` in try/finally; mark as NuGet backport. +- **`System.Threading.Tasks.Extensions` — `ValueTask` / `ValueTask`** — **NuGet**, targets 4.5+ — allocation-free on synchronous-completion paths — supersedes `Task` for high-frequency async APIs that often complete synchronously — use on hot async APIs; do not await twice; mark as NuGet backport. +- **RyuJIT (new 64-bit JIT)** — **4.6** — faster JIT codegen and better optimisation on x64 — supersedes the legacy 64-bit JIT — automatic on x64 since 4.6; verify no `` / `COMPLUS_useLegacyJit=1` forces the old JIT. +- **`System.Numerics.Vectors` — `Vector` / `Vector4` etc. (SIMD)** — **4.6** (RyuJIT hardware-accelerates `Vector`) — JIT-vectorised numeric loops — supersedes scalar numeric loops for bulk math — use for hot numeric kernels; check `Vector.IsHardwareAccelerated`. +- **`System.ValueTuple` — `ValueTuple<...>` / C# 7 tuples** — in-box since **4.7**; **NuGet** package for 4.5–4.6.x — stack-allocated lightweight tuples; no heap allocation vs `Tuple<...>` — supersedes reference-type `Tuple<...>` for multi-value returns — use for hot multi-return methods; mark as NuGet when targeting <4.7. +- **TLS 1.2 as system default / `SecurityProtocolType.SystemDefault`** — **4.7** (`DontEnableSystemDefaultTlsVersions` defaults `false` at 4.7+) — lets the OS negotiate the best TLS version; avoids hardcoding — supersedes hardcoded `ServicePointManager.SecurityProtocol = Tls`/`Tls11` or `SslProtocols.Default` — use when code pins an old/explicit TLS version; let the OS choose. +- **WCF service throttling defaults raised to per-CPU** — **WCF 4 / .NET 4.0** — defaults became ≈`16×CPU` concurrent calls / `100×CPU` sessions / `116×CPU` instances — supersedes the flat pre-4.0 defaults (`MaxConcurrentCalls=16` / `Sessions=10` / `Instances=26`) that silently throttle throughput — use when a WCF service targets pre-4.0 or sets explicit low `ServiceThrottlingBehavior` limits. +- **WCF async (TAP) service operation contracts** — **4.5** — `Task`-returning operations free the dispatcher thread during I/O-bound server work — supersedes synchronous / APM (`Begin*`/`End*`) service contracts — use for I/O-bound WCF operations to raise concurrency under the throttle limits. +- **EF6 async query/save (`ToListAsync`/`FirstAsync`/`SaveChangesAsync`, …)** — **EF6.0** (NuGet, on .NET 4.5+) — frees the thread during database I/O — supersedes synchronous EF6 calls on async request paths — use to make EF6 data access non-blocking (EF6 still has no statement batching — see the pack's data-access subsection for bulk). + +## Serialization + +- **System.Text.Json source generation** — landed in **.NET 6**, major improvements in **.NET 8** — eliminates runtime reflection for JSON serialize/deserialize, required for Native AOT, lower startup cost — supersedes `JsonSerializer` reflection defaults — use when: define a `partial class : JsonSerializerContext` with `[JsonSerializable]` attrs and pass `MyContext.Default` as the `TypeInfoResolver`; set `JsonSerializerIsReflectionEnabledByDefault=false` in csproj to force AOT-safe usage. +- **`JsonSourceGenerationMode.Serialization` (fast-path mode)** — **.NET 6+** — pre-generates serialisation code (not just metadata) for even lower per-call overhead — supersedes metadata-only mode for write-heavy paths — use when serialisation-only path; note: not supported for async streaming serialization of large payloads. +- **`JsonTypeInfoResolver.Combine`** — **.NET 8+** — chains multiple source-gen contexts into one `JsonSerializerOptions` without reflection fallback — use when mixing first-party and third-party types that each have their own context. +- **`JsonStringEnumConverter` (generic)** — **.NET 8+** — AOT-safe generic variant of `JsonStringEnumConverter`; the non-generic version is not supported by Native AOT — supersedes `JsonStringEnumConverter` (non-generic) for trimmed/AOT apps. +- **`HttpClientJsonExtensions.GetFromJsonAsAsyncEnumerable`** — **.NET 8+** — streams JSON arrays as `IAsyncEnumerable` using source-gen context overloads; avoids buffering the full response — use for large JSON array responses. +- **`JsonElement.Parse(string/ReadOnlySpan)`** — **.NET 10** — static method that parses directly to a `JsonElement` without creating a `JsonDocument` wrapper or calling `JsonSerializer`; ~2× less overhead than `JsonDocument.Parse` + `.RootElement.Clone()` — supersedes the `JsonDocument.Parse(…).RootElement.Clone()` pattern for obtaining a standalone `JsonElement` — use when you need a self-contained `JsonElement` without lifetime coupling to a `JsonDocument`. +- **`JsonObject.TryAdd(string, JsonNode)`** — **.NET 10** — adds a property to a `JsonObject` only if the key is absent, in a single lookup; eliminates the double-lookup previously required by checking `ContainsKey` then indexing — use when building or merging JSON objects where overwrite must be avoided. +- **`Utf8JsonWriter.WriteBase64StringSegment`** — **.NET 10** — writes a Base64-encoded binary property in chunks (streaming) instead of requiring the full payload to be buffered first; cuts peak memory and latency for large binary-in-JSON payloads — use when serialising large blobs (images, embeddings, file content) that arrive as a stream or chunked `ReadOnlySpan`. + +## Collections + +- **`FrozenDictionary` / `FrozenSet`** — landed in **.NET 8** (`System.Collections.Frozen`) — read-optimised immutable collections with specialised lookup strategies (e.g., length bucketing for string keys); faster Contains/TryGetValue than `Dictionary` or `ImmutableDictionary` for read-heavy workloads — supersedes `ImmutableDictionary`/`ImmutableHashSet` for lookup-only scenarios — use when the collection is built once at startup or config time and then only read. +- **`SearchValues` / `SearchValues`** — landed in **.NET 8** (`System.Buffers`) — pre-computes platform-optimised search tables for `IndexOfAny`, `ContainsAny`, `IndexOfAnyExcept`, `LastIndexOfAny` on `ReadOnlySpan`/`string` — supersedes inline char-array arguments to `IndexOfAny` — use when the same character or string set is searched repeatedly; create once (`SearchValues.Create(…)`) and cache as `static readonly`. +- **`PriorityQueue`** — landed in **.NET 6** — binary min-heap with O(log n) enqueue/dequeue — supersedes `SortedList`/manual heap for "next cheapest item" patterns — use for Dijkstra, job schedulers, any priority-ordered dequeue loop. +- **`PriorityQueue.Remove` (priority update)** — **.NET 9+** — allows updating a queued item's priority in-place — use when priorities change after enqueue (e.g., A* re-weighting). +- **Collection expressions (`[x, y, z]`)** — C# 12 / **.NET 8 SDK** — compiler selects optimal backing store (stack array, inline array, or heap collection) based on declared type; `Span` target gets stack allocation — supersedes `new T[] { … }` / `new List { … }` where the type allows — use when the collection is small and the declared type is array, `Span`, or `ImmutableArray`. +- **Inline arrays (`[InlineArray(N)]`)** — C# 12 / **.NET 8** — fixed-size contiguous storage inside a struct, exposed as `Span`; no heap allocation — use in hot-path value types that need a small fixed buffer (e.g., argument lists, ring buffers of known size). +- **`FrozenDictionary` specialisation** — **.NET 10** — `FrozenDictionary`/`FrozenSet` now have array-backed O(1) lookup specialisations for any dense primitive integral key type (byte, char, ushort, small enums, etc.); lookup time roughly halved vs .NET 9 for these key types — no API change; create via `ToFrozenDictionary()` as before; automatic when key type is eligible. +- **`FrozenDictionary.GetAlternateLookup>` GVM fix** — **.NET 10** — the generic virtual method overhead in alternate lookups (span-keyed access into a string-keyed `FrozenDictionary`) is now amortised by caching the lookup delegate; throughput improved ~40% vs .NET 9 — use when parsing text protocols against a static frozen dictionary using `GetAlternateLookup>()`. +- **`Enumerable.Sequence`** — **.NET 10** — generates a numeric range for any `INumber` type with configurable step; internally reuses the `Range` iterator's optimisation paths; drastically faster than a manual loop + `AddRange` for filling a `List` — supersedes `for`-loop range fills when the target is `IEnumerable` — use when producing a typed range of non-`int` numeric values. +- **`Enumerable.Shuffle()`** — **.NET 10** — returns a lazily-evaluated random permutation of any sequence; `Shuffle().Take(N)` uses reservoir sampling (O(N) space, single pass) and `Shuffle().Contains` uses hypergeometric probability — avoids the full-buffer-then-shuffle allocation pattern — use instead of `ToArray()` + `Random.Shared.Shuffle(arr)` + `foreach` when only a sample or a membership test is needed. + +## Strings & Searching + +- **`Regex` source generator (`[GeneratedRegex]`)** — landed in **.NET 7** — compiles regex pattern to IL at build time; eliminates runtime compilation delay and allocation — supersedes `new Regex(pattern, RegexOptions.Compiled)` and static `Regex` field patterns — use on any `partial static` method annotated `[GeneratedRegex("…")]`; verified fastest path for hot-loop regex. +- **`RegexOptions.NonBacktracking`** — landed in **.NET 7** — linear-time NFA engine; worst-case O(input) regardless of pattern complexity — supersedes default backtracking engine for untrusted input patterns or catastrophic-backtracking-prone patterns — use when security or latency predictability is required. +- **UTF-8 string literals (`"…"u8`)** — **.NET 7+** — compile-time UTF-8 byte literals typed as `ReadOnlySpan`; zero-allocation, no conversion needed when writing to UTF-8 sinks — supersedes `Encoding.UTF8.GetBytes("…")` on hot paths — use when passing literal strings to network/file APIs that accept `ReadOnlySpan`. +- **`string.OrdinalIgnoreCase` comparisons with SIMD** — **.NET 8** — case-insensitive `OrdinalIgnoreCase` string comparison uses AVX2/AVX512 internally; ~20× faster than culture-aware comparison — use `StringComparison.OrdinalIgnoreCase` explicitly to trigger fast path; avoid `ToLower()`/`ToUpper()` allocations. +- **`MemoryExtensions.IndexOf` / `Contains` on `Span`** — **.NET 6+** — vectorised search on spans; avoids `string` allocation when slicing — supersedes `string.IndexOf` on already-sliced data. + +## Memory & Spans + +- **`ArrayPool.Shared`** — available since **.NET Core 1.0** (and via `System.Buffers` NuGet on .NET Framework 4.5.1+) — rentable heap arrays avoid repeated allocations of temporary buffers; reduces GC pressure on hot I/O paths — supersedes `new T[n]` for temporary large arrays — pattern: `var buf = ArrayPool.Shared.Rent(size); try { … } finally { ArrayPool.Shared.Return(buf); }`. +- **`Span` / `ReadOnlySpan` / `Memory`** — **.NET Core 2.1+** — zero-copy slice into arrays, strings, stack memory, or native memory; eliminates sub-array copies — supersedes `ArraySegment` and offset+length pairs — use for buffer-processing methods that previously took `byte[]` + offset + length. +- **`stackalloc` with `Span`** — **.NET Core 2.1+** — stack-allocates small buffers; safe via `Span` wrapper — use for short-lived buffers of known small size (typically ≤ 256–512 bytes to avoid stack overflow); prefer `stackalloc` over `ArrayPool` below that threshold. +- **`TensorPrimitives`** — **.NET 8** (greatly expanded in **.NET 9** to ~200 overloads) — SIMD-backed bulk math operations (add, multiply, dot-product, cosine similarity, softmax, etc.) over `Span` — supersedes manual SIMD loops or scalar loops for numeric batch work — use for ML pre-processing, signal processing, embedding computations. +- **`Tensor`** — **.NET 9** (experimental) — multi-dimensional tensor with zero-copy interop with ML.NET / ONNX Runtime / TorchSharp — use for AI/ML pipelines that pass data between .NET and native inference runtimes. + +## Async & Tasks + +- **`ValueTask` / `IValueTaskSource`** — **.NET Core 2.0+** — allocation-free for the common synchronous-completion path (cache hit, already-completed I/O) — supersedes `Task` for high-frequency async APIs where synchronous completion is the common case — warning: `ValueTask` must not be awaited more than once, stored, or `.Result`-accessed without checking `IsCompleted`; violation causes subtle bugs. +- **`Task.WhenAll` / `Task.WhenEach`** — use `WhenAll` to fan out independent async operations concurrently; `.NET 9` adds `Task.WhenEach` for processing results as each completes without buffering all — supersedes sequential `await` loops over independent operations. +- **`IAsyncEnumerable` with `await foreach`** — **.NET Core 3.0+** — streams results one-at-a-time from DB/network without buffering the full result set into a `List` — supersedes `ToListAsync()` for large result sets where the consumer processes items as they arrive. +- **`Parallel.ForEachAsync`** — **.NET 6+** — async-aware parallel loop with configurable `MaxDegreeOfParallelism`; does not block thread-pool threads while awaiting — supersedes `Parallel.ForEach` with sync-over-async wrappers for I/O-bound fan-out work. +- **`Channel`** — **.NET Core 3.0+** — high-performance producer/consumer pipeline; bounded channels provide back-pressure; `Channel.CreateUnbounded()` for lock-free single-producer single-consumer — supersedes `BlockingCollection` for async producer/consumer patterns. + +## LINQ + +- **`Enumerable.CountBy` / `Enumerable.AggregateBy`** — **.NET 9+** — aggregate state by key without materialising intermediate `GroupBy` groupings; reduces allocations for group-count/group-sum patterns — supersedes `.GroupBy(…).Select(g => new { g.Key, Count = g.Count() })` — use when only the aggregated result per key is needed, not the groups themselves. +- **`Enumerable.Index`** — **.NET 9+** — enumerates `(index, element)` pairs without `Select((x, i) => …)` — supersedes `Select` with index overload for readability and minor allocation reduction. +- **`Order()` / `OrderDescending()`** — **.NET 7+** — sort `IComparable` sequence without a key selector lambda; eliminates a delegate allocation vs `OrderBy(x => x)` — use for sorting primitives or types with natural order. +- **LINQ operator JIT devirtualisation** — **.NET 8** — the JIT inlines and devirtualises common LINQ operators via dynamic PGO; tight `Select`/`Where`/`Sum` chains approach hand-written loop speed on hot paths — no API change required; benefit is automatic when the same query shape recurs. +- **LINQ `Contains` short-circuit specialisations** — **.NET 10** — ~30 new specialised `Contains` implementations across LINQ iterator types (`OrderBy`, `Distinct`, `Reverse`, `Union`, `Append`, `Concat`, etc.) allow `Contains` to skip the intermediate materialisation/sort/dedup entirely and query the source directly; up to 300× faster for `OrderBy(…).Contains(…)` — no API change; automatic when chaining `Contains` after these operators. +- **Array interface devirtualisation in LINQ** — **.NET 10** — the JIT can now devirtualise `T[]`'s interface method implementations (previously blocked); LINQ paths that took an `IList` indexer shortcut over an array-backed `ReadOnlyCollection` are now ~3× faster — automatic; no code change required. +- **`IEnumerator` stack allocation for `List`/arrays** — **.NET 10** — the JIT's expanded escape analysis (see JIT/PGO section) stack-allocates enumerator objects for `List` and array sources when iterating via `IEnumerable`; eliminates per-foreach allocation in common call patterns — automatic in .NET 10; no API change. + +## ORM / EF Core + +- **`AsNoTracking()`** — **EF Core 1.0+** — disables change tracking; no snapshot allocation, no identity-resolution dictionary overhead — use for all read-only queries where entities are not subsequently modified and saved. +- **`AsNoTrackingWithIdentityResolution()`** — **EF Core 5.0+** — no-tracking but deduplicates related entities in the result; avoids 100× Blog duplication for 100 Posts sharing the same Blog — use when you need no-tracking performance but your query loads related entities referenced by multiple rows. +- **`EF.CompileQuery` / `EF.CompileAsyncQuery`** — **EF Core 2.0+** — pre-compiles a LINQ expression to a reusable delegate; amortises LINQ-to-SQL translation cost across repeated executions — use for hot query shapes executed many times per second. +- **`ExecuteUpdate` / `ExecuteDelete`** — landed in **EF Core 7.0** — issues a single `UPDATE`/`DELETE` SQL statement without loading or tracking entities — supersedes load-mutate-SaveChanges for bulk mutations — use when updating/deleting many rows matching a predicate; avoids O(n) entity materialisation. +- **`SaveChanges` batching (EF Core 7.0 improvements)** — **EF Core 7.0** — up to 4× faster than EF Core 6 for insert-heavy workloads: removes redundant transaction wrapping for single statements, uses `OUTPUT` clause, and merges multi-row inserts — no API change; upgrade to EF Core 7+ to get automatically. +- **`AsSplitQuery()`** — **EF Core 5.0+** — issues separate SQL queries per collection `Include` instead of a Cartesian join; prevents result-set row explosion on multi-collection loads — use when query has 2+ collection-typed `Include` clauses that cause row multiplication. +- **Compiled models (`EF.CompileModel` / dotnet-ef CLI)** — **EF Core 6.0+** — pre-generates the model's internal metadata at build time; reduces startup cost for large models (100s of entities) — use when `DbContext` creation shows as startup bottleneck in profiling. +- **`DbContext` pooling (`AddDbContextPool`)** — **EF Core 2.0+** — reuses `DbContext` instances across requests; avoids per-request model-initialisation overhead — use in high-throughput ASP.NET Core apps; ensure no request-scoped state leaks between pooled instances. +- **EF Core 9 pre-compiled queries (experimental NativeAOT)** — **EF Core 9** (experimental) — C# interceptors embed final SQL and materialisation code at build time; eliminates per-startup LINQ-to-SQL translation — use with caution; not production-ready in EF9; target EF10 for stable support. +- **`ExecuteUpdate`/`ExecuteUpdateAsync` for JSON columns** — **EF Core 10** — `ExecuteUpdateAsync` can now reference JSON column properties inside the setter expression for complex-type-mapped JSON columns; issues a single server-side bulk `UPDATE` without loading entities — requires mapping the type as a complex type (not an owned entity); supersedes load-modify-SaveChanges for bulk JSON-column mutations. +- **Parameterised collection IN-list (scalar parameter mode)** — **EF Core 10** — `.Where(b => ids.Contains(b.Id))` now translates to `WHERE id IN (@ids1, @ids2, …)` by default (with EF-side padding to reduce plan proliferation), rather than a JSON-array `OPENJSON` sub-query; avoids plan cache bloat while giving the query planner accurate cardinality — automatic on EF10; override with `UseParameterizedCollectionMode(ParameterTranslationMode.*)` if needed. +- **`LeftJoin` / `RightJoin` LINQ operators** — **.NET 10 / EF Core 10** — `Enumerable.LeftJoin` and `Enumerable.RightJoin` are new first-class LINQ methods; EF Core 10 translates them directly to SQL `LEFT JOIN`/`RIGHT JOIN`; eliminates the previous `SelectMany` + `DefaultIfEmpty` workaround which generated less efficient SQL — use when the query semantics require a left or right outer join. +- **Async lazy-loading performance** — **EF Core 10** — internal `AsyncLocal` usage refactored for better lazy-loading performance on async paths; reduces per-navigation overhead in async-heavy workloads — automatic upgrade benefit. + +## Startup & AOT + +- **Native AOT** — **.NET 7** (preview), **.NET 8** (production-ready) — publishes a fully ahead-of-time compiled native binary; instant startup, no JIT warm-up, predictable latency — requires: trimming-compatible code, source-generated JSON serialisation, no reflection over types not annotated — use for CLI tools, serverless functions, containers with tight startup SLAs. +- **ReadyToRun (R2R) + Tiered PGO** — **.NET 8** — R2R pre-compiles assemblies reducing first-JIT latency; combined with dynamic PGO the runtime re-tiers R2R code using runtime profiles (was not possible before .NET 8) — enabled by default; verify `true` is not disabled in project files. +- **`[JsonSerializerIsReflectionEnabledByDefault]` MSBuild property = `false`** — **.NET 8+** — makes any reflection-based JSON call throw `InvalidOperationException`; forces migration to source-gen before publishing — use as a project guardrail for AOT/trimmed targets. +- **Trimming (`true`)** — **.NET 6+** (improved in **.NET 8**) — removes unused code from the published output; reduces cold-start assembly-load cost — requires trim annotations on reflection-heavy paths; use `ILLink.Substitutions` for conditional feature trimming. +- **DATAS GC (Dynamic Adaptation To Application Size)** — default in **.NET 9** (opt-in in .NET 8 via `GCConserveMemory`) — replaces Server GC as the default; adapts heap size dynamically to the application's actual working set; reduces memory footprint in cloud/container environments — no API change; automatic on .NET 9+. + +## JIT / PGO + +- **Dynamic PGO (Profile-Guided Optimisation)** — enabled by default in **.NET 8** (opt-in in .NET 6/7) — instruments tier-0 code, feeds tier-1 with guarded devirtualisation, loop specialisation, and inlining decisions based on actual call-site types — enabled automatically; `DOTNET_TieredPGO=0` disables it (avoid unless diagnosing JIT issues). +- **On-Stack Replacement (OSR)** — **.NET 7+** — re-compiles long-running methods mid-execution without waiting for re-entry; ~25% startup improvement for JIT-heavy workloads, 10–30% time-to-first-request improvement — automatic; no API required. +- **`Vector512` + AVX-512 hardware intrinsics** — **.NET 8** — 512-bit SIMD types and `Avx512F/BW/CD/DQ/Vbmi` intrinsics; JIT auto-vectorises `Span` loops to 512-bit where AVX-512 is available — check `Vector512.IsHardwareAccelerated` before branching on capability; use `Vector512` for manual SIMD on eligible hardware. +- **ARM SVE intrinsics** — **.NET 9** (experimental, `[Experimental]`) — scalable vector extensions; 128–2048-bit variable-width SIMD — `System.Runtime.Intrinsics.Arm.Sve`; limited to 128-bit in .NET 9; full-width in future releases. +- **`ArgumentNullException.ThrowIfNull` boxing fix** — **.NET 9** — value-type arguments no longer box in tier-0; eliminates hidden allocation at call sites for guard clauses — no change required; automatic in .NET 9. +- **Object and array stack allocation via escape analysis** — **.NET 10** — the JIT's escape analysis is significantly expanded: delegates, closure display-class objects, and small arrays that do not escape the allocating method are now stack-allocated, eliminating heap allocations and GC pressure — automatic; no API change; benefits closures, `params` arrays, and `BitConverter.GetBytes`-style patterns where the result is immediately consumed. +- **Array interface devirtualisation** — **.NET 10** — the JIT can now devirtualise calls to `T[]`'s interface method implementations (previously blocked due to runtime-generated vtables); eliminates a class of virtual-dispatch overhead when iterating or indexing arrays via interface variables — automatic; no code change required. +- **GDV in shared generic contexts** — **.NET 10** — Guarded Devirtualisation (GDV) now fires for virtual calls inside shared generic methods, enabling type-specialised codegen for patterns like `EqualityComparer.Default.Equals(a, b)` — automatic; no code change; ~2× faster for equality-check hot paths in generic code. +- **AVX10.2 intrinsics** — **.NET 10** — `System.Runtime.Intrinsics.X86.Avx10v2` class adds support for AVX10.2 instruction set; JIT uses it for improved float min/max and float conversion operations on capable hardware — check `Avx10v2.IsSupported`; useful for custom SIMD kernels on AVX10.2-capable CPUs. +- **DATAS GC tuning** — **.NET 10** — DATAS (Dynamic Adaptation To Application Size, default since .NET 9) is further tuned: fewer unnecessary collections, smoother pauses under high allocation rates, corrected fragmentation accounting — automatic; no API change; net result is steadier throughput and more predictable GC latency in .NET 10. +- **`GCHandle` / `PinnedGCHandle` / `WeakGCHandle`** — **.NET 10** — strongly-typed GC handle wrappers that reduce misuse risk and shave overhead vs the untyped `GCHandle` — use in interop-heavy or pinning-heavy code that currently calls `GCHandle.Alloc(obj, GCHandleType.Pinned)` frequently. +- **ThreadPool local-queue flush on block** — **.NET 10** — when a thread-pool thread is about to block (e.g., sync-over-async `.Wait()`), it now flushes its local work queue to the global queue; prevents priority inversion where the blocked thread's sub-tasks are starved by global-queue flood — automatic; especially beneficial for apps with accidental sync-over-async that previously suffered thread-pool deadlocks or hangs. + +## Networking & I/O + +- **`IHttpClientFactory`** — **.NET Core 2.1+** — manages `HttpMessageHandler` lifetimes, rotates DNS, and pools connections; avoids socket exhaustion and stale DNS from long-lived `HttpClient` instances — supersedes `new HttpClient()` per-request or a single static `HttpClient` — register via `services.AddHttpClient()`. +- **HTTP/3 (`HttpRequestVersion.Version30`)** — **.NET 7+** (stable) — QUIC-based transport; multiplexing without head-of-line blocking, faster reconnection — use `HttpClient` with `HttpVersionPolicy.RequestVersionOrHigher` and server-side Kestrel HTTP/3 support — verify against the currency brief for your version. +- **`FileStream` rewrite** — **.NET 6** — async `FileStream` operations are now truly async (no longer secretly sync under the hood on Windows); eliminates thread-pool starvation from file I/O — automatic; no API change, but verify code uses `await` with `FileStream` async overloads. +- **`RandomAccess` API** — **.NET 6+** — scatter/gather I/O via `SafeFileHandle` without creating a `FileStream`; enables high-throughput file access from multiple threads without a stream-level lock — use for parallel read/write to different offsets of the same file. + +## ASP.NET Core + +- **Minimal APIs** — landed in **.NET 6** — lightweight HTTP endpoint model with no MVC overhead (no filter pipeline, no view engine, no model-binder abstractions); lower per-request allocation cost for simple request/response handlers — supersedes full MVC controllers for throughput-sensitive endpoints that don't need filters, model validation, or view rendering — use when: `app.MapGet/MapPost(…)` with inline or method-group handlers. +- **Output caching middleware (`AddOutputCache` / `UseOutputCache`)** — landed in **.NET 7** — server-side full-response cache with tag-based eviction, vary-by-query/header policies, and Redis backing; prevents re-execution of expensive handlers on repeated identical requests — supersedes older `[ResponseCache]` attribute (client/CDN hint only, no server store) for server-side caching scenarios — use when cacheable GET/HEAD responses are expensive to regenerate; call `CacheOutput()` on the endpoint or apply `[OutputCache]`. +- **Rate limiting middleware (`AddRateLimiter` / `UseRateLimiter`)** — landed in **.NET 7** — built-in concurrency/token-bucket/sliding-window/fixed-window limiters with per-endpoint and global policies; prevents thread-pool and downstream overload — supersedes third-party rate-limiting middleware or manual `SemaphoreSlim` guards at the endpoint level — use when: call `RequireRateLimiting("policy")` on endpoints. +- **`TypedResults` / `Results` (minimal API typed returns)** — landed in **.NET 7** (`TypedResults` static class; public `IResult` implementation types in `Microsoft.AspNetCore.Http.HttpResults`) — enables strongly-typed return declarations on minimal API handlers; allows OpenAPI tooling to infer response shapes without reflection; no runtime overhead vs. `Results.*` — supersedes `Results.*` factory methods for endpoints that need testable, statically-typed responses — use when unit-testing minimal API handlers or generating accurate OpenAPI metadata. +- **Request Delegate Generator + Native AOT for minimal APIs** — landed in **.NET 8** — Roslyn source generator emits request-delegate glue code at build time instead of using runtime reflection for parameter binding and response serialisation; enables minimal API apps to publish as Native AOT — requires: source-generated JSON contexts, no reflection-based middleware; activates automatically when `true` or `true` — use for serverless or container workloads with tight startup/memory SLAs. +- **Keyed DI services (`AddKeyedSingleton` / `AddKeyedScoped`)** — landed in **.NET 8** — registers multiple implementations of the same interface under different string/object keys; eliminates factory-based workarounds that allocate closures — supersedes named-instance patterns using `IEnumerable` + filtering or `Func` factory delegates — use when the same service interface has multiple implementations selected at runtime by a logical key. +- **`System.IO.Pipelines` (`PipeReader` / `PipeWriter`)** — landed in **.NET Core 2.1** (integrated into Kestrel transport layer) — zero-copy, back-pressured I/O pipeline; Kestrel uses it internally for all socket reads/writes; apps that process raw request bodies benefit from `PipeReader` to avoid double-buffering — supersedes `Stream`-based request body reads for high-throughput binary/text parsing — use when parsing large or streaming request bodies without allocating intermediate `byte[]` buffers. +- **HTTP/3 in Kestrel** — preview in **.NET 6**, fully supported in **.NET 7+** — QUIC-based transport; eliminates TCP head-of-line blocking, faster connection establishment, supports connection migration — requires `HttpProtocols.Http1AndHttp2AndHttp3` on the endpoint and platform QUIC support (MsQuic); not enabled by default — supersedes HTTP/2 for latency-sensitive or mobile-heavy traffic where packet loss is a concern — use when deploying to Windows Server 2022+ or Linux with `libmsquic`. +- **In-process IIS hosting (ANCM in-process)** — default since **ASP.NET Core 3.0** (opt-in from 2.2) — runs the app inside the IIS worker process; eliminates the localhost loopback proxy hop that out-of-process ANCM adds per request — supersedes out-of-process hosting for IIS-hosted apps where request throughput matters — verify: `InProcess` in the `.csproj` or `hostingModel="inprocess"` in `web.config`. +- **Kestrel/IIS/HTTP.sys automatic memory-pool eviction** — **ASP.NET Core 10** — the `MemoryPool` instances used by Kestrel, IIS, and HTTP.sys now automatically release idle memory blocks back to the system when the app is under low load; previously pooled memory was retained indefinitely — automatic; no config required; reduces RSS for bursty or intermittently loaded services — use `IMemoryPoolFactory` DI injection to create application-level pools that also benefit. +- **Minimal API validation source generator** — **ASP.NET Core 10** — validation of minimal API handler parameters now uses a source-generated implementation instead of reflection; AOT-compatible and produces less startup overhead — enabled via `AddValidation()` in `Program.cs`; supersedes reflection-based `DataAnnotations` validation for AOT/trimmed minimal API apps. +- **`TypedResults.ServerSentEvents`** — **ASP.NET Core 10** — built-in SSE result type for minimal APIs and MVC; streams `ServerSentEventItem` values as `text/event-stream` without buffering the full response — supersedes manual `Response.WriteAsync` SSE loops — use for real-time push to browsers without WebSocket overhead. +- **`Microsoft.AspNetCore.JsonPatch` (`System.Text.Json` implementation)** — **ASP.NET Core 10** — new `Microsoft.AspNetCore.JsonPatch` built on `System.Text.Json` instead of Newtonsoft.Json; ~170× faster for apply-and-deserialize benchmarks, ~8× less allocation — supersedes Newtonsoft-based `JsonPatch` for all non-dynamic-type payloads; not a drop-in replacement: does not support `ExpandoObject`/dynamic types. + +## Blazor + +- **`` component** — landed in **.NET 5** — renders only the viewport-visible subset of a large list; calculates item positions from a fixed item height and re-renders on scroll; supports remote `ItemsProvider` delegate for server-paged data — supersedes a plain `@foreach` loop for lists where most items are off-screen — use when list length exceeds ~50–100 items and items have uniform height; set `ItemSize` to the exact pixel height to avoid a double-render pass. +- **Blazor WebAssembly AOT compilation** — landed in **.NET 6** — compiles .NET IL to WebAssembly at publish time using `true`; eliminates the WASM interpreter overhead for CPU-intensive code — trades larger initial download for significantly faster runtime throughput — use for compute-heavy WASM apps (games, data processing, image manipulation); not beneficial for I/O-bound apps where the bottleneck is network latency rather than CPU. +- **Unified render modes + per-component `@rendermode`** — landed in **.NET 8** (Blazor Web App model) — single project supports `InteractiveServer`, `InteractiveWebAssembly`, `InteractiveAuto`, and static SSR on a per-component or per-page basis; Auto mode serves from the server immediately then migrates to WASM after the bundle is cached — supersedes choosing a single hosting model for the entire app — use when different pages have different latency/scale/interactivity trade-offs; Static SSR for content pages, Interactive for form-heavy pages. +- **Streaming rendering (`[StreamRendering]` attribute)** — landed in **.NET 8** — sends static HTML immediately then streams updated content to the client as async operations complete; reduces perceived latency for pages that wait on slow data fetches — use on components with async lifecycle work (DB queries, API calls) that block the initial render; pairs well with `await Task.Yield()` to flush the placeholder content first. +- **`QuickGrid` component** — experimental in **.NET 7** as a NuGet package, officially part of the Blazor framework in **.NET 8** — virtualized, sortable, pageable data grid built on top of ``; renders only visible rows; integrates with EF Core `IQueryable` for server-side pagination — supersedes third-party grid components for common tabular-data scenarios — use when: ``. +- **Jiterpreter WASM runtime** — landed in **.NET 8** — partial JIT for the WASM interpreter that compiles hot interpreter loop iterations to native WASM code at runtime; provides significant speedup for interpreted (non-AOT) WASM apps without the larger download cost of full AOT — automatic when running Blazor WebAssembly on .NET 8+; no API change required — check `` is not set to `false` explicitly. +- **Enhanced navigation and form handling** — landed in **.NET 8** — Blazor intercepts standard `` navigations and `
` submissions and performs a `fetch` request instead of a full page load; patches the response into the DOM preserving scroll position and page state; reduces navigation latency to near-SPA levels without client-side rendering — enabled by default when `blazor.web.js` is loaded; opt-out per-link with `data-enhance-nav="false"` — avoids full round-trip HTML reload on each page visit in Blazor Web Apps. +- **Blazor script as static web asset with auto-compression and fingerprinting** — **.NET 10** — `blazor.web.js` / `blazor.server.js` are now served as static web assets with automatic Brotli/gzip compression and content-hash fingerprinting; eliminates the uncompressed embedded-resource fallback — automatic when upgrading to .NET 10; reduces script payload and enables long-lived `Cache-Control` headers. +- **Blazor WebAssembly framework asset preloading** — **.NET 10** — Blazor Web Apps emit `Link: rel=preload` headers for WASM framework assets (runtime, assemblies) on first page response; standalone WASM apps schedule high-priority download/cache of assets early in `index.html`; reduces time-to-interactive by overlapping asset download with HTML parse — automatic; no code change required. +- **`HttpClient` response streaming enabled by default in WASM** — **.NET 10** — Blazor WebAssembly `HttpClient` responses now stream by default (previously opt-in); reduces peak memory for large API responses by returning a `BrowserHttpReadStream` instead of a `MemoryStream` — automatic; opt-out per-request with `SetBrowserResponseStreamingEnabled(false)` if synchronous stream operations are needed. +- **Blazor boot config inlined into script** — **.NET 10** — `blazor.boot.json` is inlined into the Blazor script; eliminates a separate round-trip HTTP request at startup, reducing time-to-interactive by one network RTT — automatic when upgrading to .NET 10. + +## Cryptography + +- **`CryptographicOperations.HashData` (one-shot)** — **.NET 9+** — single-call hash without allocating a `HashAlgorithm` instance; internally uses hardware acceleration — supersedes `using var sha = SHA256.Create(); sha.ComputeHash(…)` for one-shot hashing. +- **`AES-NI` / `SHA-NI` hardware acceleration** — **.NET Core 2.0+** on capable hardware — AES and SHA operations use CPU intrinsics automatically; no API change — avoid software-fallback code paths: use `System.Security.Cryptography` BCL types rather than hand-rolled implementations. +- **OpenSSL 3 explicit-fetch caching** — **.NET 10** (Linux/OpenSSL platforms) — .NET now performs an explicit `EVP_MD_fetch` for digest algorithms at initialisation and caches the result, avoiding the per-call "implicit fetch" overhead introduced by OpenSSL 3's provider model; ~20% faster `SHA256.HashData` on Linux — automatic; no code change; benefit applies to all hash operations via the BCL cryptography types. +- **`X509Certificate2Collection.FindByThumbprint`** — **.NET 10** — new method that uses a stack-allocated buffer for each candidate thumbprint comparison, eliminating per-candidate `byte[]` allocations — supersedes manual `foreach` + `certificate.GetCertHash()` loops in certificate lookup code. +- **`SymmetricAlgorithm.SetKey(ReadOnlySpan)`** — **.NET 10** — span-based key setter avoids allocating an intermediate `byte[]` copy when configuring symmetric cipher key material — use instead of the array-based `Key = …` property setter on hot key-rotation paths. + +## Caching & Interop + +- **`HybridCache` (`Microsoft.Extensions.Caching.Hybrid`)** — **.NET 9** — two-level cache (in-proc L1 + optional distributed L2) with **built-in stampede protection** and tag-based invalidation — supersedes hand-rolled `SemaphoreSlim` single-flight over `IMemoryCache`/`IDistributedCache` — use for read-through caching that needs concurrency-safe population. +- **`[LibraryImport]` P/Invoke source generator** — **.NET 7** — generates marshalling stubs at compile time (AOT-friendly, no runtime IL emit, lower per-call overhead) — supersedes `[DllImport]` for new P/Invoke — use on hot or Native-AOT-targeted native interop. +- **`ComWrappers` API** — **.NET 6** — lower-overhead, trim/AOT-compatible foundation for COM interop — supersedes the built-in RCW/CCW machinery in AOT scenarios — use for high-performance or AOT COM interop. +- **COM source generator (`[GeneratedComInterface]`)** — **.NET 8** — source-generated, AOT/trim-friendly COM interop — supersedes runtime-generated COM interop for Native AOT — use when COM interop must work under trimming/AOT. + +## .NET 11 (preview — not GA) + +> **These entries are from the .NET 11 Preview 4 overview (as of 2026-06-03). .NET 11 is NOT released; GA is expected November 2026. Do NOT recommend these as actionable unless the project explicitly targets .NET 11 preview builds.** + +- **Runtime-native async (Runtime Async)** — **.NET 11 (preview — not GA)** — the runtime implements `async`/`await` state machines natively rather than via compiler-generated classes; produces cleaner stack traces and lower per-`await` overhead; the .NET 11 libraries themselves are compiled with `runtime-async=on` — no `` opt-in needed for `net11.0` TFM targets; automatic benefit for all async code. +- **JIT: bounds-check elimination, switch-expression folding, constant-folding `SequenceEqual`** — **.NET 11 (preview — not GA)** — additional JIT optimisation passes reduce redundant bounds checks, fold constant switch expressions, and constant-fold `SequenceEqual` calls where inputs are statically known — automatic; no API change. +- **Arm SVE2 intrinsics** — **.NET 11 (preview — not GA)** — `System.Runtime.Intrinsics.Arm.Sve2` class exposes SVE2 instructions on capable Arm64 hardware — for explicit SIMD code on SVE2-capable Arm64 CPUs; still experimental status. +- **Zstandard compression (`System.IO.Compression.ZstdStream`)** — **.NET 11 (preview — not GA)** — built-in Zstd compression/decompression without a NuGet dependency; significantly better compression ratio and speed than Deflate/GZip for many payload types — use for network payloads or storage where Zstd is acceptable at both endpoints. +- **`MemoryCache` built-in OpenTelemetry metrics** — **.NET 11 (preview — not GA)** — `Microsoft.Extensions.Caching.Memory.MemoryCache` emits hit/miss/eviction counters as OpenTelemetry metrics natively; no custom instrumentation needed to observe cache efficiency — use to detect cache thrashing or sizing issues without adding custom metrics code. diff --git a/.claude/skills/performance-audit/version-indexes/go.md b/.claude/skills/performance-audit/version-indexes/go.md new file mode 100644 index 00000000..737e19c1 --- /dev/null +++ b/.claude/skills/performance-audit/version-indexes/go.md @@ -0,0 +1,81 @@ +--- +index_schema_version: 1 +ecosystem: go +covered_through: "Go 1.24" +built_on: 2026-06-03 +sources: + - https://go.dev/doc/go1.19 + - https://go.dev/doc/go1.20 + - https://go.dev/doc/go1.21 + - https://go.dev/doc/go1.22 + - https://go.dev/doc/go1.23 + - https://go.dev/doc/go1.24 + - https://go.dev/doc/gc-guide + - https://pkg.go.dev/runtime/debug#SetMemoryLimit + - https://pkg.go.dev/unique@go1.23.0 + - https://pkg.go.dev/runtime#AddCleanup + - https://pkg.go.dev/slices@go1.21.0 + - https://pkg.go.dev/maps@go1.21.0 + - https://pkg.go.dev/sync/atomic#Int64 + - https://pkg.go.dev/golang.org/x/sync/errgroup + - https://go.dev/blog/pgo +--- +# Go performance version index +> Build-once lookup. The idiom-currency lane consults this first; live research only extends past +> `covered_through`. + +## Compiler & Build + +- **Profile-Guided Optimization (PGO) — preview** — landed in **Go 1.20** — compiler uses a pprof CPU profile (`-pgo=path/to/profile.pprof`) to inline hot call sites; 3–4% throughput gain — supersedes static heuristic-only inlining — use when a representative production CPU profile (`default.pgo`) is available in the main package directory. +- **Profile-Guided Optimization (PGO) — GA** — promoted to production in **Go 1.21** (default: `-pgo=auto` picks up `default.pgo`) — extends inlining to include interface-call devirtualisation; 2–7% CPU improvement on representative programs; build speed itself 6% faster (compiler was PGO-compiled) — supersedes Go 1.20 preview — commit `default.pgo` alongside source for reproducible builds. +- **PGO devirtualisation improvements** — **Go 1.22** — higher proportion of interface method calls can be devirtualised; 2–14% runtime improvement with a profile — no API change; re-profile and rebuild to benefit. +- **PGO build-time overhead reduction** — **Go 1.23** — PGO build overhead reduced from 100%+ to single-digit percentage, making PGO practical for CI/CD pipelines — no API change. +- **Compilation speed recovery** — **Go 1.20** — build speed restored to Go 1.17 levels (~10% faster than 1.18/1.19) after generics-induced regression; front-end data structure improvements — no code change required. +- **`go run` / `go tool` executable caching** — **Go 1.24** — compiled executables cached in the Go build cache; repeated `go run` invocations skip recompilation — no code change; benefits scripting and tooling loops. +- **Switch statement jump tables** — **Go 1.19** (amd64, arm64) — large integer and string switch statements compiled to O(1) jump tables instead of O(n) comparisons; ~20% faster for large switches — automatic for switch on `int`/`string` types with 8+ cases. +- **Hot basic-block alignment** — **Go 1.23** (386, amd64) — compiler aligns hot loop-header blocks to CPU cache-line boundaries; 1–1.5% throughput improvement for loop-heavy code — automatic; disable with `-gcflags=-d=alignhot=0` if binary size is a constraint. +- **Stack frame slot overlapping** — **Go 1.23** — compiler overlaps stack slots of local variables in disjoint code regions, reducing per-goroutine stack usage — automatic; benefits goroutine-heavy programs by reducing peak memory. + +## Runtime & GC + +- **`GOMEMLIMIT` / `debug.SetMemoryLimit`** — **Go 1.19** — soft heap ceiling respected by the GC even when `GOGC=off`; GC caps its CPU use at 50% to prevent thrashing — supersedes sole reliance on `GOGC` for memory-bound container workloads — use as `GOMEMLIMIT=` env var or `debug.SetMemoryLimit(bytes)`; leave 5–10% headroom below container memory limit; pair with higher `GOGC` (e.g. 200) to trade GC frequency for throughput. +- **GC CPU limiter** — **Go 1.19** — runtime enforces a 50% ceiling on GC CPU time over a `2×GOMAXPROCS` CPU-second window, preventing GC from starving application goroutines during heap spikes — automatic; no API required. +- **Goroutine initial stack sizing** — **Go 1.19** — initial goroutine stacks allocated based on historic average stack usage per function, reducing early stack-growth copying; at most 2× wasted space — automatic; reduces alloc pressure for programs spawning many goroutines. +- **Transparent huge page management** — **Go 1.21** (Linux) — runtime explicitly manages heap regions eligible for THP; up to 50% memory reduction for small heaps, up to 1% latency improvement for large dense heaps — automatic on Linux. +- **GC tail-latency reduction** — **Go 1.21** — GC tuning yields up to 40% reduction in tail (p99+) latency at a small throughput trade-off — automatic; tune back with `GOGC`/`GOMEMLIMIT` if throughput regression observed. +- **C-to-Go call overhead reduction** — **Go 1.21** (Unix) — cgo setup preserved across multiple calls from the same thread; cost drops from 1–3 µs to 100–200 ns per call — automatic for existing cgo code; benefits mixed-language hot paths. +- **Swiss Tables built-in map** — **Go 1.24** — `map` backed by a Swiss Tables hash table; parallel 8-slot probing via control-word metadata; up to 60% faster in map microbenchmarks, ~1.5% geometric-mean CPU improvement in real programs, lower average memory footprint — supersedes prior open-addressing map — automatic; no code changes needed; revert with `GOEXPERIMENT=noswissmap` to isolate issues. +- **`sync.Map` hash-trie implementation** — **Go 1.24** — internal `sync.Map` rewritten; modifications of disjoint key sets no longer contend on larger maps; no ramp-up time for low-contention loads — supersedes the prior read-optimised copy-on-write structure for write-heavy concurrent workloads — revert with `GOEXPERIMENT=nosynchashtriemap`. +- **`runtime.AddCleanup`** — **Go 1.24** — attaches a cleanup function to an object pointer; runs concurrently (not sequentially like finalizers), supports multiple cleanups per object, safe with cycles, and supports interior pointers — supersedes `runtime.SetFinalizer` for resource-release patterns — use when: closing file descriptors, releasing C memory, or evicting cache entries keyed on object lifetime; call `.Stop()` on the returned handle to cancel. +- **`race` detector upgrade (TSan v3)** — **Go 1.19** — race detector upgraded to ThreadSanitizer v3; 1.5–2× faster execution under `-race`, 50% less memory, supports unlimited goroutines — automatic when using `-race`; no code change needed. +- **Execution tracer overhaul** — **Go 1.22** — trace format redesigned; latency impact of starting/stopping execution traces dramatically reduced; streamable on-the-fly output — use `runtime/trace` or `golang.org/x/exp/trace` (1.22+ format only) for production tracing. + +## Concurrency + +- **`sync/atomic` typed values (`atomic.Int64`, `atomic.Bool`, `atomic.Pointer[T]`, etc.)** — **Go 1.19** — struct-based atomics with method receivers; `atomic.Int64` / `atomic.Uint64` are always 64-bit aligned even on 32-bit platforms, removing the alignment-fault footgun of raw `atomic.AddInt64(&x, n)` — supersedes `sync/atomic` function-based API for new code — use as struct fields; call `.Load()`, `.Store()`, `.Add()`, `.CompareAndSwap()`. +- **`sync/atomic.And` / `atomic.Or` bitwise ops** — **Go 1.23** — atomic bitwise AND/OR on `int32`/`uint32`/`int64`/`uint64` without a read-modify-write CAS loop — supersedes manual `for { old := Load(); if CompareAndSwap(old, old&mask) { break } }` patterns — use for bit-flag manipulation in concurrent hot paths. +- **`sync.Map.Clear`** — **Go 1.23** — bulk-deletes all keys without iterating via `Range`+`Delete`; O(1) allocation path — supersedes `range`-based manual deletion loop — use when resetting or expiring an entire concurrent map. +- **`errgroup.SetLimit` / `TryGo`** — **`golang.org/x/sync` v0.1.0+ (Go 1.18+)** — `SetLimit(n)` caps concurrent goroutines in the group; `TryGo(f)` submits work non-blocking (returns `false` if at limit) — supersedes manual semaphore channels for bounded parallelism — use when fanning out I/O-bound work (file reads, HTTP calls) to prevent goroutine explosion; `SetLimit(runtime.GOMAXPROCS(0))` for CPU-bound fan-out. +- **Loop variable per-iteration semantics** — **Go 1.22** — each `for`-range iteration gets its own copy of the loop variable; goroutine closures over loop variables no longer need the explicit `v := v` shadow copy — supersedes the `v := v` copy idiom (that copy is now a no-op on 1.22+) — no code change required to get correct behaviour; remove stale `v := v` copies when targeting 1.22+. +- **Unreferenced timer/`time.After` early collection** — **Go 1.23** — the runtime reworked timers so an unreferenced `Timer`/`Ticker` (including the one created by `time.After`) becomes eligible for GC as soon as it is unreachable, instead of being retained until it fires; also `Timer.Stop`/`Reset` no longer need the stale-value drain workaround — reduces (does not eliminate) the classic `time.After`-in-a-`select`-loop leak — the durable fix is still a single reusable `time.NewTimer`/`NewTicker` with `Reset`, but the per-iteration leak on 1.23+ is far cheaper than on ≤1.22. + +## Stdlib & Generics + +- **`slices` package** — **Go 1.21** (`slices`) — generic slice functions: `Sort`, `SortFunc`, `BinarySearch`, `Contains`, `Index`, `Compact`, `Grow`, `Clone`, `Delete`, `Insert`, `Max`, `Min`, `Reverse` — supersedes manual `sort.Slice` + index-hunting loops — use `slices.Sort`/`slices.SortFunc` instead of `sort.Slice` to avoid the per-call closure allocation; `slices.BinarySearch` replaces `sort.Search` boilerplate. +- **`slices` iterator functions** — **Go 1.23** — `slices.All`, `slices.Values`, `slices.Backward`, `slices.Collect`, `slices.AppendSeq`, `slices.Sorted`, `slices.Chunk` — lazy iteration and collection without intermediate allocations — use with `for range` and `iter.Seq`; avoids materialising intermediate slices in pipeline patterns. +- **`maps` package (core utilities)** — **Go 1.21** (`maps`) — generic map helpers: `Clone`, `Copy`, `DeleteFunc`, `Equal`, `EqualFunc` — supersedes manual map-copy loops and reflect-based equality — use `maps.Clone(m)` instead of a `for k, v := range` copy loop; avoids per-element type assertions. +- **`maps` iterator functions** — **Go 1.23** — `maps.All`, `maps.Keys`, `maps.Values`, `maps.Collect`, `maps.Insert` — key/value iteration without allocating a `[]K` or `[]V` intermediate slice — use with `for range maps.Keys(m)` to avoid the common `append`-keys-to-slice pattern. +- **`min` / `max` / `clear` builtins** — **Go 1.21** — compiler-intrinsic min/max over any ordered type (no function-call overhead, no generic instantiation cost); `clear(m)` zeroes a slice or deletes all map keys in one call — supersedes hand-written `if a < b { return a }` helpers and `for k := range m { delete(m, k) }` loops. +- **`sort` algorithm rewrite (pdqsort)** — **Go 1.19** — `sort.Slice`, `sort.Sort`, and `sort.Stable` use pattern-defeating quicksort; faster for common real-world distributions (sorted, reverse-sorted, few uniques) — automatic; also adds `sort.Find` as a cleaner alternative to `sort.Search`. +- **`math/rand/v2`** — **Go 1.22** — new PRNG package with PCG and ChaCha8 generators; unconditionally random-seeded global source enables per-thread states and eliminates the legacy global lock; `rand.N[T](max)` is generic over any integer type — supersedes `math/rand` (v1) for new code; global `math/rand` functions in v1 had a shared mutex; v2 global is lock-free — import `math/rand/v2` in new code. +- **`unique.Make` / `unique.Handle`** — **Go 1.23** (`unique`) — canonicalises (interns) any comparable value; two `Handle[T]` values compare equal iff their source values were equal, via a pointer comparison — reduces memory by deduplicating repeated equal values (strings, structs); O(1) handle comparison vs O(n) string comparison — use for interning repeated strings, IP addresses, struct keys; call `unique.Make(v)` once per value, store/compare `Handle[T]`. +- **`weak.Pointer[T]`** — **Go 1.24** (`weak`) — GC-aware weak reference; `Value()` returns `nil` after the referent is collected — supersedes `unsafe.Pointer` hacks for cache/canonicalisation maps — use with `runtime.AddCleanup` to build weak-keyed maps or bounded caches that don't prevent GC; primary use case is implementing the pattern underlying `unique.Make`. +- **`fmt.Append` / `fmt.Appendf` / `fmt.Appendln`** — **Go 1.19** — format directly into a `[]byte` without intermediate `string` allocation — supersedes `buf = append(buf, fmt.Sprintf(…)…)` — use when building byte buffers from formatted output in hot paths. +- **`encoding/binary` append variants** — **Go 1.19** — `binary.BigEndian.AppendUint16/32/64`, `binary.AppendVarint`, `binary.AppendUvarint` — write integers into an existing `[]byte` without allocation — supersedes `buf = append(buf, binary.BigEndian.Uint64ToBytes(v)…)` workarounds. +- **`reflect.Value` stack allocation** — **Go 1.21** — `reflect.ValueOf(arg)` no longer unconditionally forces the argument to the heap; most reflect operations also support stack-allocated values — automatic; reduces GC pressure in reflection-heavy hot paths. + +## Maps & Data Structures + +- **Built-in `map` (Swiss Tables)** — **Go 1.24** — see Runtime & GC section; the same built-in `map` type now uses Swiss Tables; all existing map code benefits without changes. +- **`maphash.Comparable[T]` / `maphash.WriteComparable`** — **Go 1.24** — hash any comparable value (struct, array, interface) consistently with Go's map key semantics — use when building custom hash maps, sharded maps, or cache keys from struct values without rolling a custom hash function. +- **`sync.Map`** — best for **read-heavy / write-once** workloads (Go 1.9+); disjoint-key write workloads improved in **Go 1.24** (hash-trie) — supersedes `map` + `sync.RWMutex` when keys are written once and read many times; for balanced read/write or highly contended writes prefer a sharded `map`+`Mutex` array. diff --git a/.claude/skills/performance-audit/version-indexes/javascript-typescript.md b/.claude/skills/performance-audit/version-indexes/javascript-typescript.md new file mode 100644 index 00000000..76dd03a0 --- /dev/null +++ b/.claude/skills/performance-audit/version-indexes/javascript-typescript.md @@ -0,0 +1,121 @@ +--- +index_schema_version: 1 +ecosystem: javascript-typescript +covered_through: "React 19 / Angular 19 (zoneless GA in 21) / Vue 3.5 / Node.js 22 LTS" +built_on: 2026-06-04 +sources: + - https://react.dev/blog/2022/03/29/react-v18 # url-to-markdown + WebFetch + - https://react.dev/blog/2024/04/25/react-19 # url-to-markdown + - https://react.dev/blog/2024/04/25/react-19-upgrade-guide # WebFetch + - https://react.dev/reference/react/memo # WebFetch (compiler/memo details) + - https://react.dev/learn/react-compiler # WebFetch + - https://angular.dev/guide/signals # WebFetch + - https://angular.dev/guide/templates/defer # WebFetch + - https://angular.dev/guide/templates/control-flow # WebFetch + - https://angular.dev/guide/zoneless # WebFetch + - https://blog.vuejs.org/posts/vue-3-4 # url-to-markdown + - https://blog.vuejs.org/posts/vue-3-5 # url-to-markdown + - https://vuejs.org/guide/best-practices/performance # WebFetch + - https://vuejs.org/guide/components/async # WebFetch + - https://vuejs.org/guide/extras/reactivity-in-depth # WebFetch + - https://nodejs.org/en/blog/announcements/v18-release-announce # WebFetch + - https://nodejs.org/en/blog/release/v20.0.0 # WebFetch + - https://nodejs.org/en/blog/release/v21.0.0 # WebFetch + - https://nodejs.org/en/blog/release/v22.0.0 # WebFetch + - https://nodejs.org/en/blog/release/v22.12.0 # WebFetch (LTS stabilizations) + - https://nodejs.org/api/worker_threads.html # WebFetch + - https://nodejs.org/en/about/previous-releases # WebFetch (LTS timeline) +--- +# JavaScript / TypeScript performance version index +> Build-once lookup. The idiom-currency lane consults this first; live research only extends past +> `covered_through`. + +## Support cadence (LTS) +**Node.js**: even-numbered majors (18, 20, 22, 24) become **LTS** (~30 months support); odd-numbered +majors (19, 21, 23) are short-lived "Current" only — a perf feature that shipped in an odd/Current +release is usually **not adoptable** by an LTS-bound project until it lands in the next even/LTS major. +Recommend the best option on the project's Node LTS line, or flag the support-track tradeoff. +**Frameworks**: React has no formal LTS; Angular supports each major ~18 months (12 active + 6 LTS); +Vue's latest minor is the supported line — for these, "currency" is about the framework version the +app already targets, not a separate LTS track. + +## React — Rendering & Concurrency + +- **`createRoot` / `hydrateRoot`** — landed in **React 18** — unlocks all concurrent rendering features (automatic batching, transitions, streaming SSR); supersedes `ReactDOM.render` / `ReactDOM.hydrate` (removed in React 19) — use when migrating any React 17 app. +- **Automatic batching** — landed in **React 18** — state updates inside `setTimeout`, Promises, native event handlers, and any async context are now batched into one re-render by default; supersedes React 17 behavior that only batched inside React event handlers — no opt-in needed once `createRoot` is used. +- **`useTransition` / `startTransition`** — landed in **React 18** — marks state updates as non-urgent so React can interrupt them to respond to higher-priority input; supersedes synchronous state updates that blocked the main thread — use when a state change triggers expensive re-renders (search, filter, pagination). +- **`useDeferredValue`** — landed in **React 18** — defers re-rendering a derived value until the browser is idle; no fixed debounce delay; interruptible — supersedes manual `setTimeout`-based debounce for display values — use for derived expensive computations fed by a fast-updating input. +- **`useSyncExternalStore`** — landed in **React 18** — safe subscription to external stores in concurrent mode without `useEffect`; supersedes ad-hoc `useEffect` subscription patterns in library code. +- **Streaming SSR (`renderToPipeableStream` / `renderToReadableStream`)** — landed in **React 18** — full Suspense support on the server; out-of-order HTML streaming; improves LCP and TTFB; supersedes `renderToString` for server rendering. +- **`React.memo` / `useMemo` / `useCallback` auto-replaced by React Compiler** — React Compiler (RC shipped alongside **React 19**, also back-compatible with React 18 + Babel) — auto-memoizes components and intermediate values throughout the tree; supersedes manual `React.memo` + `useMemo` + `useCallback` in codebases that adopt the compiler. +- **`use()` API** — landed in **React 19** — reads a Promise or Context inside render, suspending the component until resolved; Promises from Server Components are stable across re-renders (Client Component Promises recreate each render); supersedes `useEffect`-based data loading and `useContext` for conditional context reads. +- **`useOptimistic`** — landed in **React 19** — shows final state immediately while async request is in flight, reverting on failure; supersedes manual `useState` optimistic-UI patterns. +- **`useActionState`** — landed in **React 19** — manages pending/error/reset lifecycle for async form actions automatically; supersedes manual `useState` + try/catch request-state management. +- **Resource preloading APIs (`prefetchDNS`, `preconnect`, `preload`, `preinit`)** — landed in **React 19** via `react-dom` — declarative resource hints hoisted to `` without DOM manipulation; supersedes manual `useEffect` with `document.head.appendChild` for critical resource hints. +- **`useDeferredValue` with `initialValue`** — landed in **React 19** — avoids blank initial render by providing an immediate fallback value on first paint — supersedes empty-string/null workarounds that caused layout shifts. + +## React — Removed / Superseded APIs + +- **`ReactDOM.render`** — removed in **React 19**; use `createRoot` from `react-dom/client`. +- **`ReactDOM.hydrate`** — removed in **React 19**; use `hydrateRoot` from `react-dom/client`. +- **`unmountComponentAtNode`** — removed in **React 19**; use `root.unmount()`. +- **`ReactDOM.findDOMNode`** — removed in **React 19**; use `useRef` and attach the ref directly. +- **`UNSAFE_componentWillMount` / `UNSAFE_componentWillReceiveProps` / `UNSAFE_componentWillUpdate`** — deprecated since React 16.9; unsafe in concurrent mode — migrate to `componentDidMount` / `getDerivedStateFromProps` / `componentDidUpdate` or function components. +- **String refs** — removed in **React 19**; use ref callbacks or `useRef`. +- **Legacy context (`contextTypes` / `getChildContext`)** — removed in **React 19**; use `createContext`. +- **`propTypes` / `defaultProps` on function components** — removed in **React 19**; use TypeScript types + ES6 default parameters. +- **UMD builds** — removed in **React 19**; use ESM-based CDN (e.g., esm.sh) or a bundler. + +## React — Ecosystem libraries (version-independent) +> Durable React-ecosystem perf levers (not tied to a React release), carried so the idiom-currency lane +> is grounded on common React performance work beyond core React APIs. Verify the exact API against the +> library version in the lockfile. + +- **List virtualization (`@tanstack/react-virtual`, `react-window`)** — version-independent — render only the rows in/near the viewport instead of the whole collection; for long or unbounded lists this turns O(N) mounted DOM nodes + reconciliation into O(visible) — the dominant win for large tables/feeds/logs — supersedes mapping an entire large array to elements (even memoized rows still mount N nodes) — use once a list can exceed a few hundred rows. +- **Server-state caching (`@tanstack/react-query`, SWR)** — version-independent — dedupes in-flight requests, caches responses by key, and avoids refetch waterfalls and the redundant re-renders of ad-hoc `useEffect` fetching; cuts both network and render cost — supersedes per-component `useEffect` + `useState` fetch-on-mount for shared/remote data — use for any remote data read by more than one component. + +## Angular — Change Detection & Signals + +- **Signals (`signal()`, `computed()`, `effect()`)** — developer preview in **Angular 16**, stable in **Angular 17** — fine-grained push-based reactivity; `computed()` is lazy and memoized; signal reads in templates mark only the affected `OnPush` component for re-check without Zone.js; supersedes RxJS-only patterns and improves over `async` pipe subscription overhead — use for any state that drives template updates. +- **Signal inputs (`input()`)** — landed in **Angular 17** (developer preview), stable in **Angular 18** — `@Input` values exposed as signals, enabling computed/effect integration without `ngOnChanges`; supersedes `@Input()` decorator for signal-based components. +- **`linkedSignal` / `resource` API** — landed in **Angular 19** (experimental) — `linkedSignal` creates a writable signal derived from another source; `resource` manages async data loading with built-in request/loading state; supersedes manual `computed` + `effect` data-loading patterns. +- **`OnPush` change detection** — available since Angular 2; pairs with signals and `async` pipe — components only re-check when an `@Input` reference changes, an Observable emits via `async` pipe, or a signal notifies; supersedes default CheckAlways strategy for data-driven components. +- **Zoneless change detection (`provideZonelessChangeDetection()`)** — experimental in **Angular 18**, default in **Angular 21** — removes Zone.js from the dependency graph, eliminating monkey-patching overhead, reducing payload (~14 kB gzip), and improving startup time; supersedes Zone.js-driven change detection — requires explicit notification via signals, `AsyncPipe`, `markForCheck()`, or reactive forms. + +## Angular — Templates & Lazy Loading + +- **Built-in control flow (`@if`, `@for`, `@switch`)** — landed in **Angular 17** (stable) — `@for` has mandatory `track` expression compiled to key-based reconciliation, outperforming `*ngFor`'s optional `trackBy`; supersedes `*ngIf` / `*ngFor` / `*ngSwitch` structural directives — use `track item.id` not `track $index` for reorderable lists. +- **Deferrable views (`@defer`)** — landed in **Angular 17** (stable) — declarative lazy loading of component subtrees with triggers (`on viewport`, `on idle`, `on interaction`, `on hover`, `when `) and `prefetch`; reduces initial bundle and improves LCP/TTFB; supersedes ad-hoc `*ngIf` + router lazy loading for below-the-fold content — only works with standalone components. +- **Standalone components** — stable since **Angular 15**, default scaffold since **Angular 17** — tree-shaking-friendly; no `NgModule` wrapper; enables per-component lazy loading via `loadComponent` router API; supersedes `NgModule`-based feature modules for new components. + +## Vue — Reactivity & Rendering + +- **Proxy-based reactivity** — Vue 3.0 — `reactive()` uses ES Proxy instead of Vue 2's `Object.defineProperty`; no need to pre-declare properties; better performance for deeply nested objects and dynamic keys; supersedes Vue 2 reactivity — upgrade path via `@vue/compat`. +- **`shallowRef` / `shallowReactive`** — Vue 3.0 — only the top-level reference is reactive; deep mutation does not trigger updates; avoids O(n) proxy cost on large arrays/objects — supersedes putting large datasets in deep `ref`/`reactive` — use when only bulk replacement (not in-place mutation) is needed. +- **`markRaw`** — Vue 3.0 — exempts an object from being made reactive when assigned into reactive state; supersedes workaround of storing non-reactive data outside component scope — use for third-party class instances, lookup tables, large static datasets. +- **`v-memo`** — Vue 3.2 — memoizes a template subtree, skipping diffing when listed dependencies are unchanged; supersedes manual conditional rendering tricks for expensive list items — use on `v-for` rows with stable, infrequently changing keys. +- **Computed stability (only triggers on value change)** — Vue 3.4 — `computed()` now only re-triggers watchers/effects when its return value actually changes (not just when dependencies run); supersedes pre-3.4 behavior where every dependency change re-triggered downstream effects — no code change needed, automatic upgrade. +- **Reactivity system refactor (-56% memory, 10× large-array perf)** — Vue 3.5 — internal rewrite of the reactivity system; large deeply-reactive array operations up to 10× faster; 56% lower memory for reactive tracking structures; also fixes stale computed and SSR memory leaks — automatic upgrade, no API changes. +- **Reactive props destructure (`defineProps` in `