Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
92f6429
chore(super-legal-mcp): remove src/server/*.backup-* (Wave 1.2)
Number531 May 9, 2026
6670221
chore(super-legal-mcp): remove src/api-clients/*.backup-* (Wave 1.3)
Number531 May 9, 2026
d3263a0
chore(super-legal-mcp): remove reports/**/*.backup*|*.bak (Wave 1.4)
Number531 May 9, 2026
5422492
chore(super-legal-mcp): remove dead top-level test/* files (Wave 1.5)
Number531 May 9, 2026
bc3c602
chore(super-legal-mcp): remove dead jest tests (Wave 1.6)
Number531 May 9, 2026
46629bd
chore(super-legal-mcp): remove dead prompts (Wave 1.7)
Number531 May 9, 2026
8761be3
chore(super-legal-mcp): remove migration-tools/ (Wave 1.8)
Number531 May 9, 2026
2092fe5
chore(super-legal-mcp): remove src/orchestrator/ (Wave 1.9)
Number531 May 9, 2026
4fa2d21
chore(super-legal-mcp): remove src/filters/ (Wave 1.10)
Number531 May 9, 2026
356bda6
chore(super-legal-mcp): remove src/modules/conversation-bridge/ (Wave…
Number531 May 9, 2026
6fd06e7
chore(super-legal-mcp): remove src/mcp/ (Wave 1.12)
Number531 May 9, 2026
7e51df9
chore(super-legal-mcp): remove src/utils/ orphans (Wave 1.13)
Number531 May 9, 2026
abfbc77
chore(super-legal-mcp): remove src/config/ orphans (Wave 1.14)
Number531 May 9, 2026
fe99ff7
chore(super-legal-mcp): remove src/server/ legacy stack (Wave 1.15)
Number531 May 10, 2026
ce99d37
chore(super-legal-mcp): drop dead npm scripts (Wave 1.16)
Number531 May 10, 2026
ce3b15f
chore(super-legal-mcp): uninstall dead npm dependencies (Wave 1.17)
Number531 May 10, 2026
8144b16
chore(super-legal-mcp): harden .dockerignore (Wave 1.18)
Number531 May 10, 2026
69426db
docs(metrics-catalog): add 6 Exa A3 metrics + bump to v7.6.0
Number531 May 10, 2026
6940828
chore(super-legal-mcp): remove 3 stale tests for deleted legacy modul…
Number531 May 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions super-legal-mcp-refactored/.dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,24 @@ test/sdk/
test/unit/
test/integration/
test/parity/
test/fixtures/
test/smoke/
test/chaos/
test/isolated/
docs/
scripts/
docker-compose.yml
*.md
!prompts/**/*.md
.claude/

# Defense-in-depth (Wave 1.18): block dead-code recurrences
**/__tests__/**
**/*.backup*
**/*.bak
**/*.orig
kg_audit.*
staging-health-*.ndjson
WTF-IS-THIS-*.md
toPdfViaTempFile*
*.tsbuildinfo
61 changes: 59 additions & 2 deletions super-legal-mcp-refactored/docs/metrics-catalog.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Metrics Catalog

**Version**: v6.8.5 | **Date**: 2026-05-06 | **Source**: `src/utils/sdkMetrics.js`, `src/config/alertingRules.js`, `prometheus/alerts.yml`
**Version**: v7.6.0 | **Date**: 2026-05-10 | **Source**: `src/utils/sdkMetrics.js`, `src/config/alertingRules.js`, `prometheus/alerts.yml`

This catalog enumerates every Prometheus metric, OTel span, and alert rule emitted by the Super Legal MCP server. Operators reference it when building Grafana dashboards, debugging production incidents, or extending the observability surface.

Expand All @@ -17,6 +17,7 @@ This catalog enumerates every Prometheus metric, OTel span, and alert rule emitt
7. [Auto-Reconciliation Metrics (v6.7.0)](#7-auto-reconciliation-metrics-v670)
8. [Knowledge Graph & Embedding Metrics](#8-knowledge-graph--embedding-metrics)
9. [Subagent & API Client Metrics](#9-subagent--api-client-metrics)
- 9.1. [Exa A3 — additionalQueries forwarding & A/B sampling (v7.1.0–v7.6.0)](#91-exa-a3--additionalqueries-forwarding--ab-sampling-v710v760)
10. [Document Conversion Metrics](#10-document-conversion-metrics)
11. [Token Usage Counters](#11-token-usage-counters)
12. [Wave 3 Observability Errors](#12-wave-3-observability-errors)
Expand Down Expand Up @@ -144,6 +145,57 @@ Surfaces failures that the `hookDBBridge` wrapper try/catch silently swallowed p

---

### 9.1 Exa A3 — additionalQueries forwarding & A/B sampling (v7.1.0–v7.6.0)

Six metrics covering the [Exa April 2026 plan §4.3 Avenue A3](pending-updates/Exa-April-2026-updates.md) — orchestrator-authored query variations forwarded to Exa's `/search` Deep API. All metrics emit only when `EXA_ADDITIONAL_QUERIES=true`; A/B-arm metrics additionally require `EXA_ADDITIONAL_QUERIES_AB_SAMPLE > 0`. Both flags default off — zero-traffic baseline is expected and correct.

**Forwarding metric** (v7.1.0, [PR #106](https://github.com/Number531/Legal-API/pull/106)):

| Metric | Type | Labels | Buckets | Purpose |
|---|---|---|---|---|
| `claude_exa_additional_queries_count` | Histogram | `domain` | `[1, 2, 3, 4, 5]` | Variation count per Exa `/search` call when A3 forwarding fires. Bucket cap matches Exa's hard limit (5 entries). Zero observations means no adopter is passing the param yet — base plumbing inert by design. Used to track per-domain Layer 3 adopter rollout. |

**A/B sampling metrics** (v7.6.0, [PR #110](https://github.com/Number531/Legal-API/pull/110)):

Each eligible call is randomly assigned to `treatment` (additionalQueries forwarded as authored) or `control` (additionalQueries withheld; Exa auto-expansion fires). Operators compare arms in Grafana to validate quality lift before flipping `EXA_ADDITIONAL_QUERIES` to all-treatment.

| Metric | Type | Labels | Buckets | Purpose |
|---|---|---|---|---|
| `claude_exa_ab_sample_assignments_total` | Counter | `arm`, `domain` | — | Total A/B sample assignments. Used to verify traffic balance — should converge to the configured `EXA_ADDITIONAL_QUERIES_AB_SAMPLE` ratio over time. |
| `claude_exa_ab_result_count` | Histogram | `arm`, `domain` | `[1, 5, 10, 20, 50, 100]` | Result count per Exa call by A/B arm. Primary quality-lift signal — treatment arm should show ≥15% higher P50 than control for rollout. |
| `claude_exa_ab_unique_urls` | Histogram | `arm`, `domain` | `[1, 5, 10, 20, 50, 100]` | Deduplicated URL count per call. Breadth signal — treatment arm collapses redundant variations into one Deep call, so ratio of `unique_urls / result_count` is the diversity proxy. |
| `claude_exa_ab_summary_chars` | Histogram | `arm`, `domain` | `[100, 500, 1000, 5000, 10000, 50000]` | Total summary text characters returned per call. Content-depth signal — high values with few unique URLs imply Exa returned more content per source (Deep fan-out working). |
| `claude_exa_ab_latency_ms` | Histogram | `arm`, `domain` | `[100, 500, 1000, 2500, 5000, 10000, 30000, 60000]` | Wall-clock per Exa call by arm. Cost dimension — treatment latency penalty must stay ≤20% of control for rollout per the runbook decision tree. |

**Operator decision tree** (per [`docs/runbooks/exa-a3-ab-staging.md`](runbooks/exa-a3-ab-staging.md)):

```
SET EXA_ADDITIONAL_QUERIES=true, EXA_ADDITIONAL_QUERIES_AB_SAMPLE=0.5
WAIT 24-48h for natural traffic
COMPARE arms:
IF treatment lift ≥15% on result_count, unique_urls, summary_chars
AND treatment latency penalty ≤20%
→ roll out (set AB_SAMPLE=0.0; treatment becomes 100%)
ELIF lift ≤5% across all 3 outcome dims
→ refine variations, re-run
ELIF treatment regresses on any outcome dim
→ rollback (set EXA_ADDITIONAL_QUERIES=false)
```

**Distinctness signal (not a metric)**: `src/utils/exaQueryValidator.js:78-103` runs a Jaccard-similarity check between caller-supplied variations and the primary query at validation time. Variations that paraphrase the primary emit a structured warning log (`exa_query_distinctness_jaccard`); not a Prometheus counter — operator inspects logs when investigating low quality lift.

**Cardinality budget**: 6 metrics × ~8 domains × 2 arms (where applicable) ≈ 96 series. Negligible vs. the ~900-series cap of `claude_tool_duration_ms`.

**Files**:
- `src/utils/sdkMetrics.js:197-249` — metric definitions
- `src/utils/sdkMetrics.js:537-565` — `recordExaAdditionalQueriesCount`, `recordExaAbAssignment`, `recordExaAbOutcome` helpers
- `src/api-clients/BaseWebSearchClient.js` — sampling + emit sites (v7.6.0 added `_ab_arm` tag on results)
- `src/utils/exaQueryValidator.js` — Jaccard distinctness logic
- `docs/runbooks/exa-a3-ab-staging.md` — operator runbook (440 lines, decision tree, 4 failure modes)
- `docs/feature-flags.md` §39, §40 — flag definitions

---

## 10. Document Conversion Metrics

| Metric | Type | Labels | Buckets | Purpose |
Expand Down Expand Up @@ -295,11 +347,16 @@ If any series count exceeds 5000 in production, audit label sources before the P
| `recordPersistenceFailure(hook, reason)` | `claude_hook_persistence_failures_total` |
| `setCircuitBreakerState(hook, state)` | `claude_hook_circuit_breaker_state` |
| `classifyPersistenceFailure(err)` | (helper — returns bounded `reason` enum) |
| `recordExaAdditionalQueriesCount(count, domain)` | `claude_exa_additional_queries_count` (v7.1.0) |
| `recordExaAbAssignment(arm, domain)` | `claude_exa_ab_sample_assignments_total` (v7.6.0) |
| `recordExaAbOutcome({arm, domain, resultCount, uniqueUrls, summaryChars, latencyMs})` | 4 outcome histograms — `claude_exa_ab_{result_count,unique_urls,summary_chars,latency_ms}` (v7.6.0) |

---

**Reference docs**:
- Operator audit-export runbook: `docs/runbooks/v6.8.5-audit-export.md`
- Reconciliation runbook: `docs/runbooks/v6.7.0-session-reconciliation.md`
- Feature flag registry: `docs/feature-flags.md` §31a (OTel sampler)
- Exa A3 A/B staging runbook: `docs/runbooks/exa-a3-ab-staging.md` (v7.6.0)
- Feature flag registry: `docs/feature-flags.md` §31a (OTel sampler), §39 (`EXA_ADDITIONAL_QUERIES`), §40 (`EXA_ADDITIONAL_QUERIES_AB_SAMPLE`)
- System design v6.8.5: `company-strategy/system-design.md` §14c
- Exa A3 plan: `docs/pending-updates/Exa-April-2026-updates.md` §4.3
227 changes: 0 additions & 227 deletions super-legal-mcp-refactored/migration-tools/shadow-mode-proxy.js

This file was deleted.

Loading