Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .claude/skills/api-integration/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,77 @@ No code changes needed — the observability pipeline is fully generic. But veri

---

## Phase 7: Compliance Final Check (DEFENSIVE GATE)

After all integration work is done, run a final cross-cutting compliance check before opening the PR. New API integrations touch 8+ subsystems; each subsystem has audit/observability/regulator obligations that PR review can miss. This phase enforces them.

**Once `feature-compliance-scaffold` ships** (plan: `docs/pending-updates/feature-compliance-scaffold-plan.md`), run:

```bash
/feature-compliance-scaffold --feature-type api --name {client-slot}
```

Until then, walk this checklist manually. Each item maps to a dimension the scaffold skill will eventually automate. Skip a check only with a written justification — never silently.

### 7.1 Auditability (D1) — does every read/write path log?

- [ ] New tool invocations land in `hook_audit_log` via `PreToolUse` / `PostToolUse` hooks (default-on for any tool registered through `toolImplementations.js`)
- [ ] Hybrid client emits `_hybrid_metadata.source` on every result so observability can attribute native vs. websearch fallback
- [ ] If client wraps a paid/audited data source (FMP, EDGAR, etc.), confirm response volume is reasonable — accidental loops are caught at this layer

### 7.2 Traceability / OTel (D2) — are tool spans wrapped?

- [ ] Tool registration goes through the standard path that auto-wraps with `withToolSpan` (verified by checking `toolImplementations.js` registers via the canonical helper, not a bypass)
- [ ] No long-running orchestration added inside the client itself; if the hybrid client adds multi-turn behavior, wrap with `withSpan(name, attributes, fn)` from `src/utils/sdkTracing.js`
- [ ] `_hybrid_metadata.fallback_reason` populated when websearch is used (so OTel attributes carry the fallback signal)

### 7.3 Regulator tracking (D3) — Art. 12 / 14 / 15 / GDPR Art. 17, 30

- [ ] Tool invocation persisted to `hook_audit_log` (Art. 12 6-month retention covers it automatically)
- [ ] No PII in API responses; if PII is unavoidable (e.g., person-name search APIs), responses MUST flow through `pseudonymize()` from `src/utils/piiManager.js` before they reach `transcript_events` or `reports`
- [ ] If new API content lands in `reports`, `bridge_metadata.git_sha` must be carried through (Art. 15 replay envelope) — verify by checking a test memo's `bridge_metadata` JSONB column post-deploy

### 7.4 Permissions / RBAC (D4) — endpoint gating

- [ ] Tool invocation routes through standard subagent dispatch (which inherits session-level auth) — no new HTTP endpoint exposed by this client
- [ ] If you DID add a new HTTP endpoint (rare for API clients): must use `requireAdmin` middleware AND be listed in `infrastructure-health/references/admin-endpoints.md`

### 7.5 Embeddings (D5) — n/a for most APIs

- [ ] Tabular/structured APIs (BLS, FRED, ECB) opt out — embeddings don't apply to numeric series
- [ ] Text-producing APIs (CourtListener opinions, Federal Register notices, SEC filings text) → confirm any report-style synthesis lands in `report_embeddings` via the standard `embeddingService.chunkByHeaders()` path (default-on when `EMBEDDING_PERSISTENCE=true`)

### 7.6 Provenance (D6) — `reports.agent_type` + `source_writes`

- [ ] When this client's data drives a memo section, the producing subagent's `agent_type` is set on the resulting `reports` row (via the standard hook bridge — should be automatic if subagent name is in `legalSubagents/agents/`)
- [ ] If client INGESTS raw documents (not just metadata), confirm `source_writes` row created with `source_uri`, `content_hash`, `retention_class` — most tabular APIs skip this; document-fetching APIs must satisfy it

### 7.7 Database structure (D7) — almost always n/a

- [ ] No new tables or columns expected for an API client integration. If you added any (e.g., a per-client cache table): BOTH a `migrations/NNNN_*.sql` file AND matching `ensure*Schema()` DDL with `CREATE TABLE IF NOT EXISTS` / `ALTER TABLE ADD COLUMN IF NOT EXISTS` are required (see `feedback_dual_schema_paths.md`, `feedback_column_evolution_ddl.md`)

### 7.8 Storage (D8) — n/a unless ingesting raw artifacts

- [ ] No raw-document ingestion → standard tier, no WORM
- [ ] Raw-document ingestion → artifact lands in client's WORM bucket; `retention_class` set on the `source_writes` row

### 7.9 Observability metrics (D9)

- [ ] Existing metric `claude_api_client_results_total{client="{slot}"}` will fire automatically — confirm post-deploy via `/metrics` scrape that the new label value appears with non-zero count
- [ ] If client has unusual error modes (auth failure, quota exhaustion), check whether existing alert rules cover them; otherwise add to `prometheus/alerts.yml`
- [ ] Cardinality: label values bounded — `client="{slot}"` is one value, never embed dynamic IDs (request IDs, session IDs) in metric labels

### 7.10 Hooks (D10)

- [ ] Tool registration emits standard PreToolUse / PostToolUse — no new `event_type` introduced (API clients should never need one; if you find yourself adding one, reconsider)
- [ ] Persistence is fire-and-forget through `hookDBBridge.js` — never blocks the request

### When this check fails

**Do not open the PR.** Fix the gap, re-run the checklist, then proceed. The cost of catching a compliance gap pre-merge is one extra fix; post-merge it can be a v6.2.3-style production hotfix.

---

## Reference: Current Integration Points (April 2026)

| File | What to add | Location |
Expand Down
68 changes: 68 additions & 0 deletions .claude/skills/code-execution-models/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,74 @@ Inputs: {key fields}. Outputs: {key metrics + chart types}.

---

## Phase 7: Compliance Final Check (DEFENSIVE GATE)

Code-execution models touch the regulator-replay envelope (`bridge_metadata.git_sha`), the `code_executions` audit table, the `claude_code_execution_*` metrics, and chart persistence under `{sessionDir}/charts/`. A new model that's silently missing one of these breaks Aperture's EU AI Act Art. 15 audit trail or the V2 verification probe in `post-deploy-verify`.

**Once `feature-compliance-scaffold` ships** (plan: `docs/pending-updates/feature-compliance-scaffold-plan.md`), run:

```bash
/feature-compliance-scaffold --feature-type model --name M{N}
```

Until then, walk this checklist manually before opening the PR:

### 7.1 Auditability (D1) — `code_executions` row written?

- [ ] Standard execution path through `runPythonAnalysis()` in `codeExecutionBridge.js` writes a `code_executions` row with `model_id`, `success`, `duration_ms`, `stderr` truncated, `bridge_metadata` JSONB
- [ ] Tool invocation also lands in `hook_audit_log` via `PreToolUse` / `PostToolUse` for `run_python_analysis` (default-on)

### 7.2 Traceability / OTel (D2)

- [ ] `withSpan` wraps the model invocation (already done at the `runPythonAnalysis` level — verify your new model doesn't bypass the helper)
- [ ] Span attributes carry `model_id` so the existing `claude_tool_duration_ms` histogram lands per-model when filtered to `tool_name="run_python_analysis"`

### 7.3 Regulator tracking (D3) — Art. 15 replay envelope

- [ ] `bridge_metadata.git_sha` is set by the bridge (NOT by your model definition) — confirm production rows are non-`'unknown'` after deploy via `post-deploy-verify` Tier 2
- [ ] `bridge_metadata.model_id` matches the catalog entry's `id`
- [ ] No PII in input examples (`codeExecutionBridge.js` ~line 566) — examples are committed to git and become public via the `/api/catalog` endpoint
- [ ] If model produces text artifacts that may be examined by a regulator (rare for quant models), confirm they flow into a retained table — usually code-exec output is ephemeral chart + JSON, no retention concern

### 7.4 Permissions / RBAC (D4) — n/a

- [ ] No new endpoint introduced. Model is dispatched through standard subagent path (financial-analyst / data-analyst), inheriting session-level auth

### 7.5 Embeddings (D5) — n/a

- [ ] Model output is structured JSON + PNG charts. Embeddings don't apply. If your model produces a `## Findings` markdown block that lands in `reports`, that flows through the existing `report_embeddings` path — no model-specific work

### 7.6 Provenance (D6)

- [ ] Chart PNG saved to `{sessionDir}/charts/` so chart paths recorded in `code_executions` resolve to real files on disk (not orphaned base64)
- [ ] `code_executions.session_id` populated so chart → session → memo provenance chain is traceable

### 7.7 Database structure (D7) — n/a unless schema change

- [ ] Adding a new model to the catalog is a config-only change. No `migrations/`, no `ensure*Schema()` work needed
- [ ] If your model requires a new persistent state table (e.g., calibration parameters): both `migrations/NNNN_*.sql` AND `ensure*Schema()` DDL are required (see `feedback_dual_schema_paths.md`)

### 7.8 Storage (D8) — n/a

- [ ] Charts persist locally under `{sessionDir}/charts/`. They're not regulator-replay artifacts in the WORM sense (the deterministic input → output mapping IS the audit trail, captured via `bridge_metadata`). No GCS WORM tier required for chart PNGs

### 7.9 Observability metrics (D9)

- [ ] `claude_tool_invocations_v2_total{tool_name="run_python_analysis"}` will fire on dispatch — confirm post-deploy via `/metrics` scrape that the new model produces invocations (the histogram is shared across all models; per-model breakdown lives in the `code_executions` table, not in metric labels)
- [ ] `claude_tool_duration_ms` histogram covers latency — no per-model bucket; if M{N} has unusual runtime, document expected p95 in the catalog entry
- [ ] No new metric needed unless your model has a unique failure mode (e.g., convergence failure for iterative solvers — would warrant a new counter via `sdkMetrics.js`)

### 7.10 Hooks (D10) — n/a

- [ ] No new `event_type` introduced. Model dispatch fires standard `PreToolUse` / `PostToolUse` for `run_python_analysis`
- [ ] If you find yourself adding a new `event_type` (e.g., `code_execution_iterative_step`), the new value must be added to the analytics-exclusion lists in `dbFrontendRouter.js` (the same pattern as `AgentProgress` exclusion in v4.13.0) — otherwise it pollutes per-tool aggregates

### When this check fails

**Do not open the PR.** Add the missing wiring or document a `noqa` opt-out with explicit justification (e.g., "M{N} is purely deterministic — no provenance variance to track"). The post-merge cost of a missed compliance gap is much higher than the pre-merge fix.

---

## Reference: Architecture

| Component | File | Role |
Expand Down
Loading