diff --git a/.claude/skills/api-integration/SKILL.md b/.claude/skills/api-integration/SKILL.md index f08ea2d6c..455c7173f 100644 --- a/.claude/skills/api-integration/SKILL.md +++ b/.claude/skills/api-integration/SKILL.md @@ -551,6 +551,77 @@ No code changes needed — the observability pipeline is fully generic. But veri --- +## Phase 7: Compliance Final Check (DEFENSIVE GATE) + +After all integration work is done, run a final cross-cutting compliance check before opening the PR. New API integrations touch 8+ subsystems; each subsystem has audit/observability/regulator obligations that PR review can miss. This phase enforces them. + +**Once `feature-compliance-scaffold` ships** (plan: `docs/pending-updates/feature-compliance-scaffold-plan.md`), run: + +```bash +/feature-compliance-scaffold --feature-type api --name {client-slot} +``` + +Until then, walk this checklist manually. Each item maps to a dimension the scaffold skill will eventually automate. Skip a check only with a written justification — never silently. + +### 7.1 Auditability (D1) — does every read/write path log? + +- [ ] New tool invocations land in `hook_audit_log` via `PreToolUse` / `PostToolUse` hooks (default-on for any tool registered through `toolImplementations.js`) +- [ ] Hybrid client emits `_hybrid_metadata.source` on every result so observability can attribute native vs. websearch fallback +- [ ] If client wraps a paid/audited data source (FMP, EDGAR, etc.), confirm response volume is reasonable — accidental loops are caught at this layer + +### 7.2 Traceability / OTel (D2) — are tool spans wrapped? + +- [ ] Tool registration goes through the standard path that auto-wraps with `withToolSpan` (verified by checking `toolImplementations.js` registers via the canonical helper, not a bypass) +- [ ] No long-running orchestration added inside the client itself; if the hybrid client adds multi-turn behavior, wrap with `withSpan(name, attributes, fn)` from `src/utils/sdkTracing.js` +- [ ] `_hybrid_metadata.fallback_reason` populated when websearch is used (so OTel attributes carry the fallback signal) + +### 7.3 Regulator tracking (D3) — Art. 12 / 14 / 15 / GDPR Art. 17, 30 + +- [ ] Tool invocation persisted to `hook_audit_log` (Art. 12 6-month retention covers it automatically) +- [ ] No PII in API responses; if PII is unavoidable (e.g., person-name search APIs), responses MUST flow through `pseudonymize()` from `src/utils/piiManager.js` before they reach `transcript_events` or `reports` +- [ ] If new API content lands in `reports`, `bridge_metadata.git_sha` must be carried through (Art. 15 replay envelope) — verify by checking a test memo's `bridge_metadata` JSONB column post-deploy + +### 7.4 Permissions / RBAC (D4) — endpoint gating + +- [ ] Tool invocation routes through standard subagent dispatch (which inherits session-level auth) — no new HTTP endpoint exposed by this client +- [ ] If you DID add a new HTTP endpoint (rare for API clients): must use `requireAdmin` middleware AND be listed in `infrastructure-health/references/admin-endpoints.md` + +### 7.5 Embeddings (D5) — n/a for most APIs + +- [ ] Tabular/structured APIs (BLS, FRED, ECB) opt out — embeddings don't apply to numeric series +- [ ] Text-producing APIs (CourtListener opinions, Federal Register notices, SEC filings text) → confirm any report-style synthesis lands in `report_embeddings` via the standard `embeddingService.chunkByHeaders()` path (default-on when `EMBEDDING_PERSISTENCE=true`) + +### 7.6 Provenance (D6) — `reports.agent_type` + `source_writes` + +- [ ] When this client's data drives a memo section, the producing subagent's `agent_type` is set on the resulting `reports` row (via the standard hook bridge — should be automatic if subagent name is in `legalSubagents/agents/`) +- [ ] If client INGESTS raw documents (not just metadata), confirm `source_writes` row created with `source_uri`, `content_hash`, `retention_class` — most tabular APIs skip this; document-fetching APIs must satisfy it + +### 7.7 Database structure (D7) — almost always n/a + +- [ ] No new tables or columns expected for an API client integration. If you added any (e.g., a per-client cache table): BOTH a `migrations/NNNN_*.sql` file AND matching `ensure*Schema()` DDL with `CREATE TABLE IF NOT EXISTS` / `ALTER TABLE ADD COLUMN IF NOT EXISTS` are required (see `feedback_dual_schema_paths.md`, `feedback_column_evolution_ddl.md`) + +### 7.8 Storage (D8) — n/a unless ingesting raw artifacts + +- [ ] No raw-document ingestion → standard tier, no WORM +- [ ] Raw-document ingestion → artifact lands in client's WORM bucket; `retention_class` set on the `source_writes` row + +### 7.9 Observability metrics (D9) + +- [ ] Existing metric `claude_api_client_results_total{client="{slot}"}` will fire automatically — confirm post-deploy via `/metrics` scrape that the new label value appears with non-zero count +- [ ] If client has unusual error modes (auth failure, quota exhaustion), check whether existing alert rules cover them; otherwise add to `prometheus/alerts.yml` +- [ ] Cardinality: label values bounded — `client="{slot}"` is one value, never embed dynamic IDs (request IDs, session IDs) in metric labels + +### 7.10 Hooks (D10) + +- [ ] Tool registration emits standard PreToolUse / PostToolUse — no new `event_type` introduced (API clients should never need one; if you find yourself adding one, reconsider) +- [ ] Persistence is fire-and-forget through `hookDBBridge.js` — never blocks the request + +### When this check fails + +**Do not open the PR.** Fix the gap, re-run the checklist, then proceed. The cost of catching a compliance gap pre-merge is one extra fix; post-merge it can be a v6.2.3-style production hotfix. + +--- + ## Reference: Current Integration Points (April 2026) | File | What to add | Location | diff --git a/.claude/skills/code-execution-models/SKILL.md b/.claude/skills/code-execution-models/SKILL.md index 292913ccc..2e5ef5907 100644 --- a/.claude/skills/code-execution-models/SKILL.md +++ b/.claude/skills/code-execution-models/SKILL.md @@ -313,6 +313,74 @@ Inputs: {key fields}. Outputs: {key metrics + chart types}. --- +## Phase 7: Compliance Final Check (DEFENSIVE GATE) + +Code-execution models touch the regulator-replay envelope (`bridge_metadata.git_sha`), the `code_executions` audit table, the `claude_code_execution_*` metrics, and chart persistence under `{sessionDir}/charts/`. A new model that's silently missing one of these breaks Aperture's EU AI Act Art. 15 audit trail or the V2 verification probe in `post-deploy-verify`. + +**Once `feature-compliance-scaffold` ships** (plan: `docs/pending-updates/feature-compliance-scaffold-plan.md`), run: + +```bash +/feature-compliance-scaffold --feature-type model --name M{N} +``` + +Until then, walk this checklist manually before opening the PR: + +### 7.1 Auditability (D1) — `code_executions` row written? + +- [ ] Standard execution path through `runPythonAnalysis()` in `codeExecutionBridge.js` writes a `code_executions` row with `model_id`, `success`, `duration_ms`, `stderr` truncated, `bridge_metadata` JSONB +- [ ] Tool invocation also lands in `hook_audit_log` via `PreToolUse` / `PostToolUse` for `run_python_analysis` (default-on) + +### 7.2 Traceability / OTel (D2) + +- [ ] `withSpan` wraps the model invocation (already done at the `runPythonAnalysis` level — verify your new model doesn't bypass the helper) +- [ ] Span attributes carry `model_id` so the existing `claude_tool_duration_ms` histogram lands per-model when filtered to `tool_name="run_python_analysis"` + +### 7.3 Regulator tracking (D3) — Art. 15 replay envelope + +- [ ] `bridge_metadata.git_sha` is set by the bridge (NOT by your model definition) — confirm production rows are non-`'unknown'` after deploy via `post-deploy-verify` Tier 2 +- [ ] `bridge_metadata.model_id` matches the catalog entry's `id` +- [ ] No PII in input examples (`codeExecutionBridge.js` ~line 566) — examples are committed to git and become public via the `/api/catalog` endpoint +- [ ] If model produces text artifacts that may be examined by a regulator (rare for quant models), confirm they flow into a retained table — usually code-exec output is ephemeral chart + JSON, no retention concern + +### 7.4 Permissions / RBAC (D4) — n/a + +- [ ] No new endpoint introduced. Model is dispatched through standard subagent path (financial-analyst / data-analyst), inheriting session-level auth + +### 7.5 Embeddings (D5) — n/a + +- [ ] Model output is structured JSON + PNG charts. Embeddings don't apply. If your model produces a `## Findings` markdown block that lands in `reports`, that flows through the existing `report_embeddings` path — no model-specific work + +### 7.6 Provenance (D6) + +- [ ] Chart PNG saved to `{sessionDir}/charts/` so chart paths recorded in `code_executions` resolve to real files on disk (not orphaned base64) +- [ ] `code_executions.session_id` populated so chart → session → memo provenance chain is traceable + +### 7.7 Database structure (D7) — n/a unless schema change + +- [ ] Adding a new model to the catalog is a config-only change. No `migrations/`, no `ensure*Schema()` work needed +- [ ] If your model requires a new persistent state table (e.g., calibration parameters): both `migrations/NNNN_*.sql` AND `ensure*Schema()` DDL are required (see `feedback_dual_schema_paths.md`) + +### 7.8 Storage (D8) — n/a + +- [ ] Charts persist locally under `{sessionDir}/charts/`. They're not regulator-replay artifacts in the WORM sense (the deterministic input → output mapping IS the audit trail, captured via `bridge_metadata`). No GCS WORM tier required for chart PNGs + +### 7.9 Observability metrics (D9) + +- [ ] `claude_tool_invocations_v2_total{tool_name="run_python_analysis"}` will fire on dispatch — confirm post-deploy via `/metrics` scrape that the new model produces invocations (the histogram is shared across all models; per-model breakdown lives in the `code_executions` table, not in metric labels) +- [ ] `claude_tool_duration_ms` histogram covers latency — no per-model bucket; if M{N} has unusual runtime, document expected p95 in the catalog entry +- [ ] No new metric needed unless your model has a unique failure mode (e.g., convergence failure for iterative solvers — would warrant a new counter via `sdkMetrics.js`) + +### 7.10 Hooks (D10) — n/a + +- [ ] No new `event_type` introduced. Model dispatch fires standard `PreToolUse` / `PostToolUse` for `run_python_analysis` +- [ ] If you find yourself adding a new `event_type` (e.g., `code_execution_iterative_step`), the new value must be added to the analytics-exclusion lists in `dbFrontendRouter.js` (the same pattern as `AgentProgress` exclusion in v4.13.0) — otherwise it pollutes per-tool aggregates + +### When this check fails + +**Do not open the PR.** Add the missing wiring or document a `noqa` opt-out with explicit justification (e.g., "M{N} is purely deterministic — no provenance variance to track"). The post-merge cost of a missed compliance gap is much higher than the pre-merge fix. + +--- + ## Reference: Architecture | Component | File | Role | diff --git a/super-legal-mcp-refactored/docs/pending-updates/feature-compliance-scaffold-plan.md b/super-legal-mcp-refactored/docs/pending-updates/feature-compliance-scaffold-plan.md new file mode 100644 index 000000000..deb93e7f4 --- /dev/null +++ b/super-legal-mcp-refactored/docs/pending-updates/feature-compliance-scaffold-plan.md @@ -0,0 +1,510 @@ +# Plan: `feature-compliance-scaffold` Skill + +**Date**: 2026-05-07 +**Author**: Claude Opus 4.7 (drafted with Edwin) +**Status**: Plan — implementation deferred, doc lives in pending-updates until built + +## Why this skill exists + +Today's v7.0.x architecture has ten cross-cutting concerns that every new subagent / API / endpoint / code-execution model / telemetry hook / sidecar must satisfy. They're spread across: + +- **9 truth-source files** (`postgres.js`, `sdkMetrics.js`, `alerts.yml`, `alertingRules.js`, 5 router files) +- **3 wiring registries** (`clientRegistry.js`, `domainMcpServers.js`, `legalSubagents/`) +- **6 audit tables** (`hook_audit_log`, `access_log`, `source_writes`, `human_interventions`, `pii_mappings`, `kg_provenance`) +- **3 OTel instrumentation patterns** (`startRequestSpan`, `withToolSpan`, `withSpan`) +- **2 schema-evolution paths** (`migrations/*.sql` + `ensure*Schema()` DDL) + +A feature can ship through `api-integration` or `code-execution-models` and still miss `access_log` writes (GDPR Art. 30), miss `withSpan()` wrapping (EU AI Act Art. 15 replay), or miss the `ensure*Schema()` ALTER for a new column (Wave 3 v6.2.3 hotfix pattern). PR review catches some of these; reviewer fatigue and skill drift mean others slip. + +`feature-compliance-scaffold` is a **defensive, final check** — invoked just before PR merge and again pre-deploy. It does not generate code. It does not scaffold files. It validates that a recently-built feature has all ten cross-cutting concerns wired in, prints a matrix of dimension × status × remediation, and exits non-zero on any FAILED. + +This is the same pattern as `schema-doc-validator` (validates state, doesn't generate) and `post-deploy-verify` (validates deploy, doesn't roll back) — operator decides remediation. + +## What it validates (10 dimensions) + +For each dimension, the skill reads the feature manifest (or git diff), then queries the truth sources and checks that the new entity is wired in correctly. + +### D1 — Auditability (`access_log`, `hook_audit_log`) + +**Triggers when**: feature adds a new endpoint, subagent, or tool that surfaces session/report data. + +**Checks**: +- New `/api/...` endpoint that reads from `sessions`, `reports`, `transcript_events`, or `report_embeddings` → must call `accessLog.write()` (or equivalent middleware) on every read path. +- New subagent → must emit `SubagentStart` / `SubagentStop` hooks → these land in `hook_audit_log` via `hookDBBridge.js`. +- New tool → must fire `PreToolUse` / `PostToolUse` hooks. + +**Remediation pointer**: `super-legal-mcp-refactored/src/utils/accessLog.js`, `src/utils/hookDBBridge.js`. + +### D2 — Traceability / OTel (manual spans + tool span auto-wrap) + +**Triggers when**: feature adds a new orchestrated operation (subagent dispatch, multi-turn tool loop, KG build, embedding generation, PII pseudonymization, source write). + +**Checks**: +- New long-running operation (>500ms typical) is wrapped in `withSpan(name, attributes, fn)` from `src/utils/sdkTracing.js`. +- Span attributes include `agent_type` (if applicable), `stage`, `wave`. +- For code-execution sandbox runs: `bridge_metadata.git_sha` must be propagated to the resulting `code_executions` row (EU AI Act Art. 15 replay envelope). +- For new tool: gets auto-wrapped via `withToolSpan` — verify the wrapping is in place at registration time. + +**Remediation pointer**: `src/utils/sdkTracing.js:withSpan`, `withToolSpan`. Sibling-root pattern: `context.with(ROOT_CONTEXT, ...)`. + +### D3 — Regulator tracking (EU AI Act Art. 12, 14, 15; GDPR Art. 17, 30) + +**Triggers when**: feature mutates user-visible state or processes user-supplied data. + +**Checks**: +- **Art. 12 (logs, 6-month retention)**: data lands in a table covered by the retention policy (`hook_audit_log`, `access_log`, `transcript_events`, `reports`, `code_executions`). +- **Art. 14 (human oversight)**: if the feature can be overridden by an operator, must write to `human_interventions` with `intervention_type`, `actor_user_id`, `target_session_id`. +- **Art. 15 (replay envelope)**: if the feature emits artifacts that may be re-examined by a regulator, the `bridge_metadata` JSONB on the producing row must include `git_sha`, `model_id`, `prompt_hash`, `seed` (where applicable). +- **GDPR Art. 17 (right to erasure)**: if the feature stores PII, all PII fields must flow through `pii_mappings` via `pseudonymize()` from `src/utils/piiManager.js`. +- **GDPR Art. 30 (records of processing)**: read access logged in `access_log` (overlap with D1). + +**Remediation pointer**: `src/utils/piiManager.js`, `src/server/adminRouter.js:requireAdmin`, Wave 3 doc `docs/pending-updates/wave3-shipped.md`. + +### D4 — Permissions / RBAC (`requireAdmin`, `AUTH_ENABLED` gating) + +**Triggers when**: feature adds a new HTTP endpoint or admin action. + +**Checks**: +- New `/api/admin/*` endpoint declared in `adminRouter.js` must use `requireAdmin` middleware. +- New `/api/...` endpoint that mutates state must be behind `AUTH_ENABLED=true` middleware path. +- New role string (if any) must be present in seeded admin users via `client-provisioner` skill. +- Endpoint listed in `infrastructure-health/references/admin-endpoints.md` (so operator monitoring picks it up). + +**Remediation pointer**: `src/server/authMiddleware.js`, `src/server/adminRouter.js`. + +### D5 — Embeddings (`report_embeddings`, `source_chunk_embeddings`) + +**Triggers when**: feature produces text artifacts that may be queried semantically. + +**Checks**: +- For new `agent_type` that produces a `report_type`: `EMBEDDING_PERSISTENCE=true` flow covers it via `embeddingService.chunkByHeaders()` → `report_embeddings` INSERT. +- For new raw source ingestion: `source_chunk_embeddings` row created (Wave 2 provenance bridge). +- Cosine similarity threshold sane (default 0.3 — flag deviations). + +**Remediation pointer**: `src/utils/embeddingService.js`, `src/utils/sourceChunkEmbeddingService.js`. + +### D6 — Provenance chain (`source_writes`, `reports.agent_type`, `kg_provenance`) + +**Triggers when**: feature writes to `reports`, ingests raw sources, or builds KG nodes. + +**Checks**: +- New report-producing subagent: `reports.agent_type` populated on INSERT. +- New raw-source ingestion: `source_writes` row created with `source_uri`, `content_hash`, `retention_class`. +- New KG node creation: `kg_provenance` row links node → contributing report(s) (Wave 2 architecture: KG operates on reports, not raw sources). +- `bridge_metadata.git_sha` matches the git rev that created the row (overlap with D2 / D3.Art15). + +**Remediation pointer**: `src/utils/knowledgeGraphExtractor.js`, `src/utils/hookDBBridge.js` (for `reports` write path). + +### D7 — Database structure (dual-path: migration + `ensure*Schema()`) + +**Triggers when**: feature adds a new table, column, index, or constraint. + +**Checks**: +- New table → has BOTH a `migrations/NNNN_description.sql` file AND matching `CREATE TABLE IF NOT EXISTS` in the appropriate `ensure*Schema()` function in `postgres.js`. +- New column on existing table → has `ALTER TABLE ADD COLUMN IF NOT EXISTS` in `ensure*Schema()` (NOT just an updated `CREATE TABLE` shape — Wave 3 v6.2.3 lesson: that's a no-op against production). +- New index → both migration file AND `CREATE INDEX IF NOT EXISTS`. +- pgvector dim consistency: any new embedding column matches `1536` (Gemini default) unless explicitly justified. + +**Remediation pointer**: `super-legal-mcp-refactored/migrations/`, `src/db/postgres.js:ensure*Schema()` functions, `feedback_dual_schema_paths.md`, `feedback_column_evolution_ddl.md`. + +### D8 — Storage (GCS WORM, retention class) + +**Triggers when**: feature persists artifacts that need replayable archival (raw sources, generated memos, charts). + +**Checks**: +- Artifact path lands in client's WORM bucket (`gs://super-legal-worm-{client-id}-us-east1`). +- Object Lock retention mode enabled. +- `retention_class` populated on `source_writes` / `reports` row. +- Compute SA has `roles/storage.objectAdmin` on the bucket (per-client provisioning detail). + +**Remediation pointer**: `client-provisioner` skill, Wave 3 GCS tiering section. + +### D9 — Observability metrics (Prometheus + alerts) + +**Triggers when**: feature adds a new measurable behavior (latency, error rate, queue depth, success rate). + +**Checks**: +- New metric declared in `src/utils/sdkMetrics.js` with `name: 'claude_*'` convention. +- Histogram bucket suffix derives from `_ms` base (Prometheus auto-derives `_bucket`/`_count`/`_sum`). +- Cardinality budget respected (label values bounded — flag any free-form string label). +- New alert in `prometheus/alerts.yml` OR `src/config/alertingRules.js` with sane threshold + `for:` duration. +- Alert routes to PagerDuty / Slack via existing receiver — flag any orphan alert. + +**Remediation pointer**: `src/utils/sdkMetrics.js`, `prometheus/alerts.yml`. + +### D10 — Hook persistence (write through `hookDBBridge.js`) + +**Triggers when**: feature emits SDK lifecycle hooks (SubagentStart, PostToolUse, etc.) or adds a new `event_type`. + +**Checks**: +- New `event_type` value used in `hook_audit_log` is exempt from analytics queries that must exclude it (e.g., `event_type='AgentProgress'` exclusion in 3 dbFrontendRouter queries — v4.13.0 lesson). +- Persistence is fire-and-forget (`.catch()`) — never blocks the request path. +- Gated behind `HOOK_DB_PERSISTENCE` flag. +- Circuit breaker state queryable via `claude_hook_circuit_breaker_state` metric (D9 overlap). + +**Remediation pointer**: `src/utils/hookDBBridge.js`, `src/utils/sdkHooks.js`. + +## Invocation + +```bash +# Manifest mode — preferred for new features +/feature-compliance-scaffold --feature-spec docs/pending-updates/{feature}.md + +# Diff mode — auto-detect new symbols from a branch +/feature-compliance-scaffold --git-range main..HEAD + +# Direct mode — validate a specific feature type by name +/feature-compliance-scaffold --feature-type api --name fred +/feature-compliance-scaffold --feature-type subagent --name bond-analyst +/feature-compliance-scaffold --feature-type endpoint --name "POST /api/admin/foo" +/feature-compliance-scaffold --feature-type model --name M59 +/feature-compliance-scaffold --feature-type table --name new_audit_table + +# Filter dimensions +/feature-compliance-scaffold --git-range main..HEAD --dimensions D3,D7,D9 +``` + +**Triggers**: `compliance check`, `feature compliance`, `pre-merge audit`, `cross-cutting check`, `/feature-compliance-scaffold`. + +## Directory layout + +``` +.claude/skills/feature-compliance-scaffold/ +├── SKILL.md +├── scripts/ +│ ├── check.sh Entry point. Resolves args, dispatches D1-D10. +│ ├── extract-feature-symbols.py From manifest or git diff → JSON list of new symbols +│ │ (tables, columns, endpoints, agent_types, tool_names, +│ │ metrics, alerts, event_types). +│ ├── dimensions/ +│ │ ├── D1-auditability.py +│ │ ├── D2-traceability.py +│ │ ├── D3-regulator.py +│ │ ├── D4-permissions.py +│ │ ├── D5-embeddings.py +│ │ ├── D6-provenance.py +│ │ ├── D7-db-structure.py +│ │ ├── D8-storage.py +│ │ ├── D9-metrics.py +│ │ └── D10-hooks.py +│ └── format-report.py JSON aggregation → markdown matrix +└── references/ + ├── dimensions-catalog.md Full text of D1-D10 with examples + remediation + ├── feature-types.md Per-feature-type checklist (which D's apply by default) + ├── truth-sources.md Where each truth source lives (overlap with + │ schema-doc-validator's extraction-paths.md) + ├── feature-spec-schema.md JSON manifest format (for --feature-spec mode) + └── output-spec.md Report template +``` + +## Feature spec (manifest) format + +Single-file YAML manifest that describes a feature. Lives under `docs/pending-updates/{feature}.md` (existing convention) with a `feature-compliance:` block at the top: + +```yaml +--- +feature-compliance: + type: api # api | subagent | endpoint | model | table | hook | embedding + name: bls + ships: + tables: [] # any new tables + columns: # column additions on existing tables + - { table: reports, name: bls_series_id, type: text } + endpoints: [] # new HTTP routes + agent_types: [] # new subagents + tool_names: [search_bls_series, get_bls_data] + metrics: [claude_api_client_results_total] # new metrics emitted + alerts: [] # new alert rules + event_types: [] # new hook_audit_log event_types + embeddings: false # produces embeddable text? + pii: false # processes PII? + storage_class: standard # standard | worm | tombstone + human_overrideable: false # admin can override? + rbac_required: false # gated by requireAdmin? + # Optional opt-outs (with justification) + noqa: + - { dimension: D5, reason: "BLS responses are tabular not textual — embeddings N/A" } +--- +``` + +The skill reads this block, then runs only the dimensions whose triggers fire for this feature shape. + +## Output format + +``` +## Feature Compliance Report +Feature: bls (type: api) | Git: main..HEAD | Timestamp: 2026-05-07T... + +### Overall: PASSED ✓ | WARNING ⚠ | FAILED ✗ + +### Dimension Matrix + +| # | Dimension | Status | Notes | +|---|-----------|--------|-------| +| D1 | Auditability | ✓ | hook_audit_log writes via hookDBBridge.js (PreToolUse/PostToolUse hooks fire for new tools) | +| D2 | Traceability/OTel | ✓ | tools auto-wrapped via withToolSpan; no long-running orchestration added | +| D3 | Regulator | ✓ | no PII; no human override path; bridge_metadata.git_sha N/A (no code-exec output) | +| D4 | Permissions | n/a | no new endpoints | +| D5 | Embeddings | n/a | opt-out: BLS responses are tabular | + +| D6 | Provenance | ⚠ | reports.agent_type populated for bls-derived reports? VERIFY MANUALLY | +| D7 | DB structure | ✗ | CRITICAL: column `reports.bls_series_id` declared in manifest but missing ALTER TABLE in ensureSessionsSchema() | +| D8 | Storage | ✓ | standard tier; no WORM artifacts | +| D9 | Metrics | ✓ | claude_api_client_results_total{client="bls"} — labels bounded | +| D10 | Hooks | ✓ | no new event_types | + +### CRITICAL (must fix before merge) + +- D7 reports.bls_series_id: add to ensureSessionsSchema() in postgres.js: + ALTER TABLE reports ADD COLUMN IF NOT EXISTS bls_series_id TEXT; + +### WARNINGS (should verify) +- D6: confirm reports.agent_type = 'bls-analyst' on rows produced via this client + +### Coverage +Dimensions evaluated: 10 | Triggered: 7 | Passed: 5 | Warnings: 1 | Failed: 1 | N/A: 3 +``` + +Exit codes: +- `0` — all PASSED (or only N/A) +- `1` — at least one WARNING +- `2` — at least one FAILED + +## Implementation notes + +### Reuse from existing skills + +- **Truth extraction**: extend `schema-doc-validator/scripts/extract-truth.py` rather than reimplement. Add columns for `event_types`, `agent_types`, `tool_names` extraction (currently scoped to schema/metrics/alerts/endpoints). +- **JSON intermediate → markdown synthesis**: copy pattern from `post-deploy-verify/scripts/format-report.py`. +- **Severity tuple convention**: `(severity, dimension, status, message, remediation)` — same shape as session-diagnostics / post-deploy-verify. +- **Pre-flight**: reuse `python3`, `jq`, `git` checks. No `gcloud` needed (read-only against repo). + +### Symbol extraction (the hardest part) + +The skill must figure out **which symbols are new** from a diff. Approaches: + +1. **Manifest mode (preferred)**: feature author declares new symbols in the YAML header. Trustworthy because the author owns it. +2. **Git diff mode (best-effort)**: parse `git diff --name-status main..HEAD`, run targeted regex against added lines: + - New `CREATE TABLE` / `ALTER TABLE ADD COLUMN` blocks in `postgres.js` + - New `name: 'claude_*'` declarations in `sdkMetrics.js` + - New `router.{get|post}('/api/...'` lines in router files + - New `name: '...'` agent definitions in `legalSubagents/agents/*.js` + - New tool `name: '...'` in `toolDefinitions.js` + + Symbol resolution edge cases (renames, refactors, hoists) will produce false positives — manifest mode wins on accuracy. + +### Negative test (how to verify the skill catches what it should) + +When the skill ships, run it against: +1. **Real feature without compliance gaps**: use the v6.3.0 SDK upgrade PR (#79) — should report PASSED. +2. **Real feature with known gap**: revert the v6.2.3 hotfix locally (commit `f9ad200`) — should report FAILED on D7 (`access_log.actor_user_id` column missing in `ensure*Schema()`). +3. **Synthetic gap**: introduce a new metric in `sdkMetrics.js` without an alert in `alerts.yml` — should report WARNING on D9. + +## Implementation phases (build order) + +The skill is built in five phases, each with explicit deliverables and dependencies. Stop after each phase to verify before proceeding. + +### Phase A — Scaffold + symbol extraction (foundation, ~3 hr) + +**Deliverables**: +- `.claude/skills/feature-compliance-scaffold/SKILL.md` with frontmatter + workflow + invocation +- `scripts/check.sh` — entry point, arg parsing (`--git-range`, `--feature-spec`, `--feature-type`, `--name`, `--dimensions`), pre-flight checks +- `scripts/extract-feature-symbols.py` — parses git diff OR manifest YAML → emits `symbols.json` with new tables, columns, endpoints, agent_types, tool_names, metrics, alerts, event_types +- `references/feature-spec-schema.md` documenting the optional manifest YAML format + +**Dependencies**: reuses `extract-truth.py` from `schema-doc-validator` for code-side truth — extend it to expose `agent_types`, `tool_names`, `event_types` (currently scoped to schema/metrics/alerts/endpoints). + +**Verification**: invoke against `git diff main..HEAD` on a real branch (e.g., the v6.3.0 SDK upgrade PR #79) — should emit valid `symbols.json` with the changes from that PR. + +### Phase B — Core dimensions D1, D6, D7, D10 (~3 hr) + +These four dimensions cover audit/provenance/schema/hooks — the most common compliance failure modes per Wave 1-3 history. + +**Deliverables**: +- `scripts/dimensions/D1-auditability.py` +- `scripts/dimensions/D6-provenance.py` +- `scripts/dimensions/D7-db-structure.py` (includes dual-path check + backup-restore export-list check) +- `scripts/dimensions/D10-hooks.py` (includes exclusion-list check + full hook catalog + synthetic event_types) + +**Verification**: Run against the three production-bug fixtures. PB-1 (missing migrations) MUST trigger D7 CRITICAL. PB-2 (`access_log` user-correlation) is NOT directly D-checked because `actor_user_id` doesn't exist yet — but the broader pattern (new `access_log` consumer) should trigger D1 WARNING. + +### Phase C — Compliance dimensions D3, D4 (~3 hr) + +These are the highest-value dimensions for EU AI Act / GDPR coverage. + +**Deliverables**: +- `scripts/dimensions/D3-regulator.py` covering Art. 12, 13, 14, 15, 17, 25, 30, 33 (full list per gap audit) +- `scripts/dimensions/D4-permissions.py` (admin/user roles + multi-role hint detection + OAuth path) + +**Verification**: Synthetic test — add a column `email_address VARCHAR(200)` to a fictional new table without `pseudonymize()` upstream. D3 must flag GDPR Art. 25. + +### Phase D — Telemetry dimensions D2, D5, D8, D9 (~2.5 hr) + +**Deliverables**: +- `scripts/dimensions/D2-traceability.py` (includes span attribute cardinality budget) +- `scripts/dimensions/D5-embeddings.py` +- `scripts/dimensions/D8-storage.py` (per-client bucket pattern + retention-class consistency) +- `scripts/dimensions/D9-metrics.py` (alert receiver routing check) + +**Verification**: Synthetic test — declare a new metric in `sdkMetrics.js` without alert. D9 reports WARNING. Add a free-form `span.setAttributes({user_email: ...})`. D2 reports CRITICAL. + +### Phase E — Aggregation, output, and references (~2 hr) + +**Deliverables**: +- `scripts/format-report.py` — JSON aggregation → markdown matrix per output spec +- `references/dimensions-catalog.md` — full text of D1-D10 with examples + remediation +- `references/feature-types.md` — per-feature-type checklist (which D's apply by default) +- `references/truth-sources.md` — extends `schema-doc-validator/references/extraction-paths.md` +- `references/output-spec.md` — report template +- Wire to `chmod +x` all scripts, add the slash command registration + +**Verification**: Full end-to-end run against current `main` branch — should report PASSED (no in-flight features). Then against `chore/v7.0.2-feature-compliance-scaffold` itself — expected output: docs-only changes, no symbol extraction triggers, PASSED. + +### Phase F — Test fixture suite + `noqa` curation (~1.5 hr) + +**Deliverables**: +- `test/feature-compliance-scaffold-fixtures/` with three permanent test fixtures: + - `fixture-passing/` — minimal feature manifest representing a clean tabular API addition (BLS-style) + - `fixture-failing-pb1/` — synthetic feature that defines a new table only in `ensure*Schema()` without migration → must report D7 CRITICAL + - `fixture-failing-pii/` — synthetic feature that adds a `email VARCHAR` column without `pseudonymize()` call site → must report D3 CRITICAL on Art. 25 +- Curate `noqa` documentation: when is opt-out legitimate, what justification format + +**Verification**: All three fixtures behave as expected. Document the fixture invocation in SKILL.md. + +## Effort estimate (revised after gap audit) + +| Component | Effort | +|-----------|--------| +| Phase A — Scaffold + symbol extraction | 3 hr | +| Phase B — D1, D6, D7, D10 (audit/provenance/schema/hooks) | 3 hr | +| Phase C — D3, D4 (regulator/permissions, expanded for Art. 13/25/33) | 3 hr | +| Phase D — D2, D5, D8, D9 (telemetry, expanded for cardinality + alert routing) | 2.5 hr | +| Phase E — Aggregation + output + references | 2 hr | +| Phase F — Test fixtures + noqa curation | 1.5 hr | +| **Total** | **~15 hr** | + +Original estimate was 12 hr; gap-audit additions add ~3 hr (mostly D3 expansion to four extra GDPR/AI Act articles, and span/alert/exclusion-list sub-checks). + +## Test fixture matrix + +When the skill ships, run it against these inputs and verify the expected output: + +| Fixture | Input | Expected exit | Expected dimensions to trigger | +|---------|-------|---------------|-------------------------------| +| Clean PR | v6.3.0 SDK upgrade PR #79 | 0 (PASSED) | none | +| Current branch | `chore/v7.0.2-feature-compliance-scaffold` (docs-only) | 0 (PASSED) | none | +| PB-1 synthetic | new table in `ensureHookSchema()` without migration file | 2 (FAILED) | D7 CRITICAL | +| PB-2 synthetic | new endpoint reading `access_log` without auth middleware | 2 (FAILED) | D1 + D4 CRITICAL | +| PB-3 synthetic | new code-execution model without `bridge_metadata.git_sha` propagation verified | 2 (FAILED) | D3 (Art. 15) + D6 CRITICAL | +| PII column | new column matching `email|phone|ssn` heuristic without upstream `pseudonymize()` call | 2 (FAILED) | D3 (Art. 25) CRITICAL | +| Cardinality | new metric labeled with free-form string | 1 (WARNING) | D2 + D9 WARNING | +| Orphan alert | new `severity: critical` alert without receiver route | 1 (WARNING) | D9 WARNING | +| Exclusion-list miss | new high-volume `event_type` not added to `dbFrontendRouter.js` `NOT IN (...)` | 1 (WARNING) | D10 WARNING | +| Tabular API | BLS-style API addition (no PII, structured response) | 0 (PASSED) | none — D5 verified via downstream `report_embeddings` path | + +## Implementation kickoff checklist + +When starting Phase A, walk this checklist: + +- [ ] Branch from current `main` (post-PR #98 merge): `git checkout -b chore/v7.0.x-build-feature-compliance-scaffold` +- [ ] Read `super-legal-mcp-refactored/src/utils/sdkTracing.js`, `sdkHooks.js`, `sdkMetrics.js` end-to-end (~30 min) +- [ ] Read `.claude/skills/schema-doc-validator/scripts/extract-truth.py` to understand the truth-extraction pattern being extended +- [ ] Confirm `git log` for any new schema/metric additions since plan was written (2026-05-07) that should land in the dimension catalog +- [ ] Validate the test fixtures matrix above is still complete after any v7.0.x additions +- [ ] Walk Phase A → F sequentially, verifying each phase before moving on + +When complete, this plan moves to `archive/feature-compliance-scaffold-plan.md` and the SKILL.md becomes the source of truth. + +## What this does NOT cover + +- **Code generation**: skill never writes feature code. Generation belongs in `api-integration` / `code-execution-models`. +- **Cross-deploy regression**: only validates current state vs. expected. Cross-deploy drift is `infrastructure-health`'s job. +- **Runtime probing**: skill is static-analysis only. It does not query the live container — that's `post-deploy-verify` Tier 2/3. +- **Auto-remediation**: defensive, not corrective. Reports what's missing; operator (or PR author) fixes it. +- **CI integration**: skill is operator-invoked. A `.github/workflows/feature-compliance.yml` is a future enhancement. + +## Sequencing with other skills + +Feature build pipeline (Aperture's preferred order): + +1. `/api-integration` or `/code-execution-models` — build the feature. +2. `/schema-doc-validator` — verify operator docs are accurate. +3. **`/feature-compliance-scaffold`** ← new gate, runs before PR merge. +4. `/deploy` — push to staging. +5. `/post-deploy-verify` — confirm deploy carried the feature correctly. +6. `/infrastructure-health` — ongoing monitoring. + +This positions the new skill as the **last gate before merge** — defensive against silent compliance gaps that PR review and unit tests miss. + +## Design decisions (resolved 2026-05-07 by Edwin) + +1. **Manifest NOT essential**. Diff mode is the default — skill auto-detects new symbols from `git diff main..HEAD`. Manifest YAML block is optional and used to suppress false positives or document opt-outs (`noqa` with justification). The skill must work end-to-end on a branch without any manifest. Symbol-extraction false positives from rename/refactor are accepted as the cost of frictionless invocation; operator silences via post-hoc `noqa`. + +2. **D8 storage policy is "match the deployment, not enforce WORM"**. If the client's existing artifacts use WORM (per `sessions.retention_class` / `reports.retention_class` — `source_writes` inherits via `session_id` FK, no per-row retention column), new artifacts must also use WORM (consistency). If the deployment uses standard GCS, new artifacts inherit standard. The check reads the dominant retention class on existing rows for the deployment, then flags drift. WORM is not universally required — only consistency is. + +3. **D5 applies to all feature types — tabular APIs are not exempt**. Earlier framing conflated raw API response shape with embedding target. Correct framing: when a feature's data lands in a `reports` row (via memo synthesis or section writer), the existing `report_embeddings` path covers it via `embeddingService.chunkByHeaders()` when `EMBEDDING_PERSISTENCE=true`. D5 verifies the path is wired, not whether the API response itself is "embeddable text". No default opt-out for any feature type. + +4. **Skill is pre-PR-merge only, NOT pre-deploy**. `post-deploy-verify` is the pre-deploy gate. Double-gating creates redundant friction with no compliance benefit (the artifacts being checked don't change between PR-merge and deploy). + +--- + +## Gap audit (2026-05-07) + +Three parallel exploration agents probed the 10 dimensions for gaps across observability, schema/storage, and compliance. Findings split into two buckets: **plan amendments** (extend the dimensions) and **production bugs surfaced** (real defects discovered during the audit, fix separately from the skill). + +### Plan amendments (extend D1-D10 before implementation) + +**D2 (Traceability) — span attribute cardinality budget** +- New spans must use bounded enum labels (e.g., output of `classifyToolName`, `classifyAgent`) — never free-form strings (user emails, query text, client IDs as labels). OTel attributes accept any string; without enforcement, a feature can silently explode trace cardinality. Add sub-check: grep new `span.setAttributes()` calls for unbounded interpolation. + +**D3 (Regulator) — expand to Art. 13, 25, 33 + erasure audit trail** +- **EU AI Act Art. 13 (transparency export)**: new `agent_type` must be queryable via `/api/admin/audit-export` (`dbFrontendRouter.js`) so regulators can enumerate the agent population. +- **GDPR Art. 25 (privacy by design)**: new columns matching PII heuristics (`email`, `phone`, `name`, `ip_address`, `ssn`, `dob`) must flow through `pseudonymize()` from `piiManager.js` before INSERT — verify call site, not just schema. +- **GDPR Art. 33 (breach detection)**: new feature that reads `access_log` or `pii_mappings` must have a corresponding alert rule covering anomalous read patterns. +- **GDPR Art. 17 cascade audit**: when `erasePII()` or tombstone operations fire, a `human_interventions` row with `intervention_type='gdpr_erasure'` must be written. Cascade DELETE without intervention row = silent erasure (regulator can't reconstruct who triggered it). + +**D4 (Permissions) — multi-role + OAuth/SSO** +- Currently the codebase has only `admin` and `user` roles. If a feature needs intermediate roles (operator, auditor, read-only), the role enum must be extended in `authMiddleware.js` AND seeded via `client-provisioner` AND covered by a new middleware analogous to `requireAdmin`. Default check: flag any new endpoint description containing "operator-only" / "auditor-only" / "read-only" without corresponding middleware. +- If a new endpoint description says "SSO-only" or "OAuth-only", verify integration through `OAuthTokenManager.js` exists. + +**D7 (DB structure) — backup-restore + multi-tenant FK** +- New tables must be added to `client-backup-restore` skill's export list. Otherwise client-offboarding loses the data. +- Tenant-scoped tables (anything per-client) should have `client_id` FK, not just `session_id`. Today `source_writes` ties only via `session_id`; new sensitive tables should be explicit. + +**D8 (Storage) — per-client bucket pattern** +- `gcsTieringDaemon.js` currently hardcodes `super-legal-worm-us-east1`. Per-client provisioning (per `client-provisioner` skill) creates `super-legal-worm-{client-id}-us-east1`. D8 must validate that new artifacts use the per-client bucket, not the shared one (multi-tenant isolation). + +**D9 (Metrics) — alert receiver routing** +- D9 currently checks alert exists in `alerts.yml` / `alertingRules.js`. Must additionally verify a receiver route exists in alertmanager config — alerts with `severity: critical` that route to `/dev/null` fire silently. + +**D10 (Hooks) — exclusion lists, full hook catalog, synthetic events** +- New high-volume `event_type` must be added to the `event_type NOT IN (...)` exclusion clauses in `dbFrontendRouter.js` analytics queries (~3 sites, per v4.13.0 `AgentProgress` lesson). Without this, new events bloat per-tool aggregates. +- Hook catalog in D10 reference must enumerate the full surface beyond SubagentStart/Stop/PreToolUse/PostToolUse: `SessionStart`, `SessionEnd`, `PreCompact`, `Notification`, `PermissionRequest`. Features that mutate session state must integrate session-level hooks. +- Synthetic event types (e.g., `PromptInjectionDetected`) injected during `PostToolUse` handler — not native SDK hooks — must be registered in a `SYNTHETIC_EVENT_TYPES` enum to prevent cardinality explosion. + +### Production bugs surfaced by the audit + +The audit incidentally found three real defects in v7.0.x. These are **separate from the skill plan** — file follow-up issues and fix independently. + +**PB-1: Missing migrations for `citation_source_links` and `code_execution_inputs`** +- `src/db/postgres.js:392-429` defines both tables in `ensureHookSchema()` (`CREATE TABLE IF NOT EXISTS`), but `migrations/` has 001-010 with no entry covering either. This is a dual-path violation per `feedback_dual_schema_paths.md`. +- **Impact**: `client-backup-restore` and any operator running migrations against an empty replica will not get these tables. Reconciliation behavior on a fresh replica is undefined. +- **Fix**: add `011_citation-source-links.up.sql` + `.down.sql` and `012_code-execution-inputs.up.sql` + `.down.sql`. + + +**PB-2: `access_log` has no `actor_user_id` — only `requester VARCHAR(200) DEFAULT 'anonymous'`** +- `src/db/postgres.js:256-269` defines `access_log` with a free-form `requester` string. +- **Impact**: GDPR Art. 30 records-of-processing requires correlating reads to verified user identity. Today's audit can only show that *someone* (string-matched) read a session — not which authenticated user. Compliance gap. +- **Fix**: add `ALTER TABLE access_log ADD COLUMN IF NOT EXISTS actor_user_id INTEGER REFERENCES users(id)` in `ensureAccessLogSchema()` and a matching migration. Backfill `actor_user_id` from `requester` where parseable (email lookup against `users.email`). + + +**PB-3: `code_executions.timeout` BOOLEAN column missing despite documented audit semantic** +- `src/db/postgres.js:374-382` (comments) document the audit disambiguation: `stop_reason='refusal'` vs. `timeout=true, stop_reason=null`. The `stop_reason` ALTER landed; the `timeout` ALTER did not. +- **Impact**: regulator audit query "which executions timed out?" requires fallback heuristics (`execution_time_ms > 120_000`) instead of a direct lookup. EU AI Act Art. 15 replay envelope is degraded. +- **Fix**: `ALTER TABLE code_executions ADD COLUMN IF NOT EXISTS timeout BOOLEAN DEFAULT FALSE` + matching migration. Backfill from `execution_time_ms >= 120000 AND stop_reason IS NULL`. + +--- + +## Status + +**As of 2026-05-07**: design plan complete, implementation NOT started. Sister skills `schema-doc-validator` and `post-deploy-verify` are merged on `main` (PR #97, commit `57cf70b9`). Interim Phase 7 manual checklists are merged into `api-integration` and `code-execution-models` (PR #98). When implementation begins, work from the "Implementation phases" section above; this plan moves to `archive/` and the SKILL.md becomes source of truth.