Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .claude/skills/api-integration/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,9 +333,16 @@ Append tool reference block after the FRED block (before `## ENRICHMENT PROTOCOL
- tool_name_1, tool_name_2, tool_name_3
```

**Also**: append a per-domain block to the `MCP_FALLBACK_INSTRUCTIONS` template literal at `_promptConstants.js:721`. This teaches subagents what tool prefix maps to the new domain when MCP routing falls back. Format:

```
**{domain}** — `mcp__{domain}__*` (e.g., `mcp__{domain}__search_items`)
```

- [ ] Tool count matches actual number of schemas
- [ ] Domain name matches DOMAIN_GROUPS key
- [ ] Tool names match schema names exactly
- [ ] `MCP_FALLBACK_INSTRUCTIONS` block present at `_promptConstants.js:721`

### 3.6 Client Registry — `src/server/clientRegistry.js`

Expand Down Expand Up @@ -368,6 +375,29 @@ Three updates:
- [ ] All `expectedCount` sites updated
- [ ] All subagent assertion arrays updated

### 3.8 Feature Flag Registration (when API is gated)

When the new API requires runtime gating (e.g., FMP_ENABLED, EMBEDDING_PERSISTENCE pattern), register the flag in two places:

1. **`super-legal-mcp-refactored/flags.env`** — append the flag near related flags (e.g., near `FMP_ENABLED`):
```bash
{SERVICE}_ENABLED=false
```
2. **`super-legal-mcp-refactored/src/config/featureFlags.js`** — export from the canonical block. Pattern (matching FMP):
```javascript
{SERVICE}_ENABLED: process.env.{SERVICE}_ENABLED === 'true',
```

When gated:
- `domainMcpServers.js` `DOMAIN_GROUPS` — wrap the domain key in a conditional spread: `...(featureFlags.{SERVICE}_ENABLED ? { '{domain}': {Name}Tools } : {})`
- `toolDefinitions.js` `allTools` — same conditional spread pattern (`...(featureFlags.{SERVICE}_ENABLED ? {Name}Tools : [])`)
- `clientRegistry.js` slot — conditional construction: `...(featureFlags.{SERVICE}_ENABLED ? { {slotName}: new {Name}HybridClient(...) } : {})`

- [ ] Flag declared in `flags.env`
- [ ] Flag exported in `featureFlags.js`
- [ ] All conditional spreads wired in `domainMcpServers.js`, `toolDefinitions.js`, `clientRegistry.js`
- [ ] Default value is `false` (opt-in) unless flag is universal

---

## Phase 4: Testing (MUST PASS before merge)
Expand Down
4 changes: 3 additions & 1 deletion .claude/skills/client-offboarding/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,13 +94,15 @@ All exported via `psql COPY TO STDOUT` + gzip to `gs://super-legal-worm-{client_

### Phase 4: Final Report

**Step 15**: Generate offboarding report
**Step 15**: Generate offboarding report (markdown emitted to stdout + saved to `~/.aperture/offboarding-{client_id}-{date}.md`)
- Client ID, offboarding date, operator
- Resources deleted (with timestamps)
- Archives created (with GCS paths + checksums)
- Remaining resources (WORM bucket — retained for compliance)
- Final cost estimate (last 30 days billing for this client's resources)

**Future enhancement (deferred)**: PDF rendering via pandoc (already used elsewhere in repo for `aperture-demo-preview.pdf`). Markdown output is the canonical artifact today; pandoc wiring is a 1-command future addition (`pandoc offboarding-{client_id}.md -o offboarding-{client_id}.pdf --pdf-engine=xelatex`). Operator can run manually post-offboarding.

## Resource Naming Convention (matches provisioner)

| Resource | Pattern | Action |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,61 @@ def main():
"remediation": "",
})

# D10.4 — new SSE event_types that drive transcript_events / frontend
# rendering must appear in (a) transcriptDBBridge.js allowlist (or
# equivalent persistence path) AND (b) frontend handleStreamEvent switch.
import os as _os
repo_root_p = _os.environ.get("REPO_ROOT") or ""
if not repo_root_p:
cur_p = _os.path.abspath(_os.getcwd())
for _ in range(10):
if _os.path.isdir(_os.path.join(cur_p, "super-legal-mcp-refactored")):
repo_root_p = cur_p
break
cur_p = _os.path.dirname(cur_p)
transcript_paths = [
_os.path.join(repo_root_p, "super-legal-mcp-refactored", "src", "utils", "transcriptDBBridge.js"),
_os.path.join(repo_root_p, "super-legal-mcp-refactored", "src", "utils", "transcriptPersistence.js"),
]
frontend_path = _os.path.join(repo_root_p, "super-legal-mcp-refactored", "test", "react-frontend", "app.js")
transcript_text = ""
for p in transcript_paths:
if _os.path.isfile(p):
try:
transcript_text += "\n" + open(p, errors="replace").read()
except OSError:
pass
frontend_text = ""
if _os.path.isfile(frontend_path):
try:
frontend_text = open(frontend_path, errors="replace").read()
except OSError:
frontend_text = ""

for event_type in s.get("event_types") or []:
# Skip the synthetic types (already handled above)
if event_type in SYNTHETIC_TYPES:
continue
in_transcript = transcript_text and (f"'{event_type}'" in transcript_text or f'"{event_type}"' in transcript_text)
in_frontend = frontend_text and (f"case '{event_type}'" in frontend_text or f'case "{event_type}"' in frontend_text)
if not transcript_text and not frontend_text:
# Neither file accessible; skip silently (covered by D10.1 above)
continue
if not in_transcript:
findings.append({
"dimension": "D10", "status": "WARNING",
"check": f"D10.4 event_type '{event_type}' transcript persistence",
"message": f"event_type '{event_type}' not found as a string literal in transcriptDBBridge.js / transcriptPersistence.js — may be silently dropped from transcript_events",
"remediation": f"Add '{event_type}' to the transcript-event allowlist in transcriptDBBridge.js (or equivalent persistence file).",
})
if not in_frontend:
findings.append({
"dimension": "D10", "status": "WARNING",
"check": f"D10.4 event_type '{event_type}' frontend handler",
"message": f"event_type '{event_type}' not handled in test/react-frontend/app.js handleStreamEvent switch — frontend will ignore it",
"remediation": f"Add `case '{event_type}': ...` branch in handleStreamEvent (app.js).",
})

emit_findings("D10", findings)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,48 @@ def main():
"remediation": "Confirm the producing path triggers embeddingService.chunkByHeaders() with EMBEDDING_PERSISTENCE=true.",
})

# D5.4 — new tables that look like report-derived content should have an
# embedding write path wired in hookDBBridge.js (chunkByHeaders call).
# Without this check, an embedding-friendly table can ship with no producer.
import os, re as _re
repo_root_p = os.environ.get("REPO_ROOT", "")
if not repo_root_p:
# Best-effort: walk up from CWD
cur_p = os.path.abspath(os.getcwd())
for _ in range(10):
if os.path.isdir(os.path.join(cur_p, "super-legal-mcp-refactored")):
repo_root_p = cur_p
break
cur_p = os.path.dirname(cur_p)
bridge_path = os.path.join(repo_root_p, "super-legal-mcp-refactored", "src", "utils", "hookDBBridge.js")
bridge_text = ""
if os.path.isfile(bridge_path):
try:
bridge_text = open(bridge_path, errors="replace").read()
except OSError:
bridge_text = ""
for table in s.get("tables") or []:
# Heuristic: tables suffixed _embeddings / _chunks always need a
# producer; tables ending in _reports / _content / _memos are
# candidates for the report_embeddings flow
report_like = _re.search(r"(_embeddings|_chunks|_reports|_content|_memos|_artifacts)$", table)
if not report_like:
continue
if bridge_text and table in bridge_text and "chunkByHeaders" in bridge_text:
findings.append({
"dimension": "D5", "status": "WARNING",
"check": f"D5.4 embedding write-path for '{table}'",
"message": f"table '{table}' looks embeddable; hookDBBridge.js mentions both — verify chunkByHeaders() runs against rows from this table",
"remediation": "Manually trace the call site in hookDBBridge.js. Confirm INSERT to {table} triggers an embeddingService.chunkByHeaders() call.",
})
else:
findings.append({
"dimension": "D5", "status": "FAILED",
"check": f"D5.4 embedding write-path for '{table}'",
"message": f"new table '{table}' looks embeddable (suffix matches /_(embeddings|chunks|reports|content|memos|artifacts)$/) but no chunkByHeaders() call references it in hookDBBridge.js",
"remediation": f"Add an embedding write path in src/utils/hookDBBridge.js: when INSERT to {table} happens, call embeddingService.chunkByHeaders(content, {{table:'{table}'}}). Gated behind EMBEDDING_PERSISTENCE=true.",
})

emit_findings("D5", findings)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,50 @@ def main():
"remediation": "Run post-deploy-verify Tier 2 t3-bridge-metadata-git-sha.sql + t3-code-execution-models.sql after deploy. If git_sha='unknown', deploy.sh missed --build-arg COMMIT_SHA=$(git rev-parse HEAD).",
})

# D6.5 — KG-relevant subagents must populate kg_provenance.source_hash
# from upstream source_writes (Wave 2 provenance bridge architecture).
import os as _os, re as _re
repo_root_p = _os.environ.get("REPO_ROOT") or ""
if not repo_root_p:
cur_p = _os.path.abspath(_os.getcwd())
for _ in range(10):
if _os.path.isdir(_os.path.join(cur_p, "super-legal-mcp-refactored")):
repo_root_p = cur_p
break
cur_p = _os.path.dirname(cur_p)
kg_paths = [
_os.path.join(repo_root_p, "super-legal-mcp-refactored", "src", "utils", "knowledgeGraphExtractor.js"),
_os.path.join(repo_root_p, "super-legal-mcp-refactored", "src", "utils", "kgService.js"),
_os.path.join(repo_root_p, "super-legal-mcp-refactored", "src", "utils", "kgWrite.js"),
]
kg_text = ""
for p in kg_paths:
if _os.path.isfile(p):
try:
kg_text += "\n" + open(p, errors="replace").read()
except OSError:
pass
for agent in s.get("agent_types") or []:
if not kg_text:
break
# If the agent is referenced by KG-write paths, source_hash must be populated
agent_in_kg = agent in kg_text or _re.search(rf"agent[_\s]?type\s*[:=]\s*['\"]?{_re.escape(agent)}", kg_text)
if agent_in_kg:
if "source_hash" in kg_text and "kg_provenance" in kg_text:
findings.append({
"dimension": "D6", "status": "PASSED",
"check": f"D6.5 KG provenance for agent '{agent}'",
"message": f"agent '{agent}' appears in KG-write path; kg_provenance.source_hash populated",
"remediation": "",
})
else:
findings.append({
"dimension": "D6", "status": "WARNING",
"check": f"D6.5 KG provenance for agent '{agent}'",
"message": f"agent '{agent}' is KG-relevant but kg_provenance.source_hash population is not visible in knowledgeGraphExtractor.js / kgService.js / kgWrite.js",
"remediation": "Confirm KG INSERT path writes source_hash from upstream source_writes (Wave 2 provenance bridge — KG operates on reports, not raw sources).",
})

emit_findings("D6", findings)


Expand Down
120 changes: 120 additions & 0 deletions .claude/skills/retention-lifecycle/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
name: retention-lifecycle
description: Automate retention-class transitions (standard → worm → tombstone) and GDPR Art. 17 erasure orchestration. Wraps the 7 admin governance endpoints (legal-hold, retention-class, tombstone, pii/erase) shipped in Wave 3 (v6.2.0). Replaces ad-hoc operator SQL. Triggers — retention enforce, retention scan, tombstone session, gdpr erase, art 17 erase, /retention-lifecycle. Supports flags — --scan (dry-run, list expired), --enforce (promote/tombstone/erase), --erase-subject <pseudonym_id>, --client <id>.
---

# Retention Lifecycle — Automated Wave 3 governance enforcement

## What this does

The Aperture platform stores client legal-research outputs with explicit retention classes (`standard`, `worm`, `tombstoned`) and per-row `retention_expires_at`. Wave 3 (v6.2.0) shipped the schema + 7 admin endpoints, but **no enforcement workflow**. This skill closes that gap.

Three operations:

1. **Scan** — query `sessions` and `reports` for rows past `retention_expires_at`, grouped by current `retention_class`. Read-only; emits a markdown summary.
2. **Enforce** — promote/tombstone expired rows by calling the per-session admin endpoints. Refuses to act on rows with `legal_hold = TRUE`.
3. **Erase subject** — full GDPR Art. 17 orchestration for a pseudonymized data subject. Triggers `pii/erase` plus downstream cascade verification.

## Workflow

```bash
# Read-only scan against the deployed instance
/retention-lifecycle --scan

# Apply transitions (requires confirmation per row class)
/retention-lifecycle --enforce

# GDPR Art. 17 erasure for a single subject
/retention-lifecycle --erase-subject pseudonym-abc123

# Per-client (multi-tenant) — defaults to current Aperture deployment
/retention-lifecycle --scan --client aperture
```

## Retention class state machine

```
standard ──[expires_at < NOW & no legal_hold]──> worm ──[+90d in worm]──> tombstoned
│ │
└──> [legal_hold=TRUE blocks all transitions]
pii_mappings.encrypted_value
redacted; row stays for audit
```

See `references/retention-classes.md` for class semantics and `references/art-17-flow.md` for the GDPR cascade graph.

## Pre-flight

Required:
- `psql` (Cloud SQL Auth Proxy or direct via static-IP whitelist `34.26.70.60`)
- `gcloud` (authenticated; project `gen-lang-client-0797903624`)
- `curl` + admin bearer token (Aperture admin user; obtain via `user-management`)
- `jq`

Required env:
- `SUPER_LEGAL_BASE_URL` (default `http://34.26.70.60:3001`)
- `ADMIN_BEARER_TOKEN` (operator-supplied, never logged)

## Drift safeguards

- **legal_hold check** — admin endpoints already refuse to tombstone rows with `legal_hold=TRUE`. Skill double-checks before issuing the call to fail fast with a clearer message.
- **WORM bucket Object Lock** — files promoted to `worm` class are uploaded to `gs://super-legal-worm-{client}-us-east1` with `per_object_retention.mode=Enabled`. Once promoted, files cannot be deleted until retention period elapses, even by project owner.
- **human_interventions row** — every transition writes a row with `intervention_type` in {`retention_enforced`, `gdpr_erasure`, `tombstoned`}. Audit trail per Art. 30 record-of-processing requirements.
- **Idempotency** — re-running `--enforce` on already-promoted rows is safe (the underlying admin endpoints are idempotent).

## Output format

```
## Retention Lifecycle Report
Client: aperture
Mode: --scan
Timestamp: 2026-05-08T...

### Expired rows by class
| Table | retention_class | count | oldest expires_at | newest expires_at |
|-----------|----------------|-------|--------------------------|--------------------------|
| sessions | standard | 12 | 2026-04-12T00:00:00Z | 2026-05-07T11:23:14Z |
| sessions | worm | 3 | 2026-02-08T00:00:00Z | 2026-02-08T00:00:00Z |
| reports | standard | 47 | ... | ... |

### Legal hold blockers
- 2 rows with legal_hold=TRUE and retention_expires_at < NOW. NOT enforced. Operator must clear hold first via /api/admin/sessions/:id/legal-hold.

### Operator next steps
- [ ] Review per-class counts above
- [ ] Run /retention-lifecycle --enforce to apply transitions
- [ ] Validate human_interventions rows after enforcement
```

## Truth sources (do not modify)

- Retention columns: `super-legal-mcp-refactored/src/db/postgres.js:272-279` (sessions + reports `legal_hold`, `retention_class`, `retention_expires_at` + indexes)
- Wave 3 tables: `postgres.js:240-252` (source_writes), `:256-269` (access_log), `:283-296` (human_interventions), `:299-310` (pii_mappings)
- Admin endpoints: `super-legal-mcp-refactored/src/server/adminRouter.js`
- `POST /api/admin/sessions/:sessionId/legal-hold` (L130)
- `POST /api/admin/sessions/:sessionId/retention-class` (L148)
- `POST /api/admin/sessions/:sessionId/tombstone` (L170)
- `POST /api/admin/pii/erase/:sessionId` (L187)
- `POST /api/admin/sessions/:sessionKey/rebuild-kg` (L200)
- `POST /api/admin/sessions/:sessionKey/rebuild-artifacts` (L227)
- Retention manager: `super-legal-mcp-refactored/src/utils/retentionManager.js` — `applyRetentionClass`, `setLegalHold`, `tombstoneSession`
- PII manager: `super-legal-mcp-refactored/src/utils/piiManager.js` — `pseudonymize` (L25), `dePseudonymize` (L51), `erasePII` (L75)
- Tiering daemon: `super-legal-mcp-refactored/src/utils/gcsTieringDaemon.js:54` — `tierOldFiles()` already runs in production for raw-source GCS tiering. This skill is the per-row complement.

## intervention_type values (must use exactly)

- `retention_enforced` — automated standard → worm or worm → tombstone transition
- `gdpr_erasure` — Art. 17 PII erasure complete
- `tombstoned` — manual tombstone via admin endpoint

These strings are checked case-sensitive by audit-export queries. New values must be added to `references/human-interventions-types.md` and migrated via `/schema-evolve`.

## What this does NOT cover

- **Bulk multi-client orchestration** — defer to `client-fleet-health` (Phase C) once shipped. This skill operates on one client at a time.
- **Backup-then-tombstone** — use `client-backup-restore` first if the client wants a final archive before tombstone.
- **Cross-client erasure** — Art. 17 is per-deployment by design (single-tenant); a single subject pseudonym is unique to one client deployment.

Exit codes: `0` clean (or only WARN-level rows skipped), `1` partial (some endpoints failed), `2` fatal (auth, network, missing env).
Loading