feat(exa): A3 coverage expansion to 5 high-value tools (v7.5.1) by Number531 · Pull Request #109 · Number531/Legal-API

Number531 · 2026-05-09T20:53:08Z

Summary

First use of the augmentor pipeline (PR #108) for coverage extension. Adds A3 additionalQueries plumbing to 5 high-traffic legal-research tools, demonstrating the per-tool cost has dropped from ~80 LoC to ~19 LoC.

Tools added (15 → 20)

Tool	Domain	Use case
`lookup_citation`	`case-law`	Cite-checking — present in nearly every memo
`search_judges`	`judges` (new axis menu)	Judicial conflict-of-interest analysis
`search_sec_filings_fulltext`	`securities`	EDGAR EFTS Boolean search
`search_federal_register_notices`	`federal-register`	Sunshine Act / agency announcement tracking
`search_fda_warning_letters`	`fda-warning-letter` (new axis menu)	Pharma diligence regulatory compliance

Augmentor pattern in action

Per-tool changes uniform across all 5 tools (uniform pattern enabled by PR #108):

traits: ["exa-routable", "domain:X"] declaration in toolDefinitions.js — 1 line per tool
WebSearchClient method update: destructure + spread to executeExaSearch options — ~3 lines per method
Existing domains (case-law, securities, federal-register) reuse augmentor's DOMAIN_DESCRIPTIONS from PR feat(exa): A3 Phase A — full pipeline (v7.3.0 → v7.5.0 augmentor refactor) #108 — 0 new axis menus
New domains (judges, fda-warning-letter) add fresh axis-menu entries — ~10 lines each

Total: ~97 LoC for 5 tools = ~19 LoC per tool (vs ~80 LoC pre-refactor — 4× reduction).

A3 coverage impact

Before this PR: 15 tools (~65–70% of memo tool calls)
After this PR: 20 tools (~75–80% of memo tool calls)
Increases A/B test population by ~30% for upcoming staging memo runs

Test results

✅ 64/64 augmentor snapshot tests (was 49, +15 new — byte-equivalence + ordering for new tools)
✅ 214/214 cumulative Exa-suite tests (was 199, +15)
✅ 20/20 live API verification shapes accepted by Exa (was 15)
✅ Property ordering invariant preserved (additionalQueries last)
✅ required array order preserved
✅ Augmentor strips traits from output (no MCP wire-format leak)

Test plan

Snapshot equivalence for all 20 A3 tools
Live Exa API accepts request body shapes for all 5 new tools
All existing tests pass without modification
Coverage extension test pattern (from PR docs(skills): A3 inheritance in api-integration + subagent-scaffold templates #111) covers new methods

Out of scope

LLM adoption testing on new domains (judges, fda-warning-letter axis menus). The schema descriptions are byte-correct and structurally identical to validated PR feat(exa): A3 Phase A — full pipeline (v7.3.0 → v7.5.0 augmentor refactor) #108 tools; quality measurement deferred to staging A/B (PR feat(exa): A/B sampling logic — quality-lift measurement (v7.6.0) #110).
Skill template updates (separate PR).
A/B sampling logic (PR feat(exa): A/B sampling logic — quality-lift measurement (v7.6.0) #110, separate).

Predecessors

PR feat(exa): A3 Phase A — full pipeline (v7.3.0 → v7.5.0 augmentor refactor) #108 (v7.3.0–v7.5.0) — augmentor pipeline foundation; this PR is its first re-use

🤖 Generated with Claude Code

First-use of the augmentor pipeline (PR #108) for coverage extension. Adds A3 additionalQueries plumbing to 5 high-traffic legal-research tools. Tools added: - lookup_citation (domain:case-law) - search_judges (domain:judges — NEW axis menu) - search_sec_filings_fulltext (domain:securities) - search_federal_register_notices (domain:federal-register) - search_fda_warning_letters (domain:fda-warning-letter — NEW axis menu) Per-tool effort dropped from ~80 LoC pre-refactor to ~19 LoC per tool (trait declaration + WebSearchClient destructure + spread). Existing domains reused from augmentor's DOMAIN_DESCRIPTIONS; only 2 new axis menus authored (judges, fda-warning-letter). A3 coverage: 15 tools → 20 tools (~30% population increase). Per-memo coverage estimated ~75-80% (was ~65-70%). Tests: - 64/64 augmentor snapshot tests (was 49, +15 new) - 214/214 cumulative Exa-suite tests - 20/20 live API verification shapes accepted (was 15) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…112) Top-level CHANGELOG was missing the Exa A3 follow-up wave shipped between 2026-05-08 and 2026-05-09. Adds a comprehensive [Unreleased] section above the v7.1.0 entry covering: - v7.2.0 (PR #107) — orchestrator-authored variations through exa_web_search; shared validator extracted; Track A audit reversal documented - v7.3.0 → v7.5.0 (PR #108) — per-domain plumbing for 4 tools, schema rewrite with Jaccard distinctness telemetry, LLM adoption test harness (44% real vs. 100% isolated finding), augmentor refactor (~80 LoC → ~19 LoC per tool) - v7.5.1 (PR #109) — coverage expansion 15 → 20 tools using the augmentor - v7.6.0 (PR #110) — EXA_ADDITIONAL_QUERIES_AB_SAMPLE flag for quality-lift measurement, 4 outcome metrics per arm, 7 unit tests - PR #111 — api-integration + subagent-scaffold templates auto-inherit A3 - PR #112 — exa-a3-ab-staging.md runbook (440 lines, decision tree, failure modes, metrics reference) Cumulative: A3-enabled tools 0 → 20, Exa-suite tests 130 → 221, live API shapes 5 → 20, +2 flags, +6 metrics, +3 shared modules. All flag-gated; zero production behavior change until operator opts-in via staging A/B. Per-project CHANGELOG already documents each version individually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…xBareChartRefs + promptEnhancer defensive (Task #107) Three coupled source fixes that complete the chart-path rendering pipeline introduced by Phase 4.13 v1.6-polish Task #99 (chart path guidance in _promptConstants.js). Without these, the canonical ../charts/<name>.png reference convention would break in PDF/DOCX generation. EMPIRICALLY VALIDATED by 2026-05-25 PLTR canary (session 2026-05-25-1779733982) which generated a 77,596-byte research-plan.pdf with 8/8 chart references rendered correctly. FIX 1 — src/utils/documentConverter.js (+19 LOC) Add cwd: options.resourcePath to pandoc execFileAsync calls in convertToDocx + convertToPdf. Pandoc's --resource-path flag IS honored by the native pandoc writers (incl. DOCX) but NOT by the typst PDF backend — typst resolves image paths relative to its own working directory, not pandoc's. Without cwd override, every chart-bearing PDF fails with "file not found (searched at /app/charts/chart_xxx.png)". The cwd override is conditional (only when options.resourcePath is provided), preserving backward compatibility with callers that don't pass it. The DOCX path mirrors the PDF fix defensively (DOCX writer honors --resource-path natively but cwd override keeps both paths consistent and survives writer changes). FIX 2 — src/utils/markdownNormalizer.js (+48 LOC) Add fixBareChartRefs self-healing transform that detects bare references like ![chart](pltr.png) and rewrites to ![chart](charts/pltr.png) when the file exists in a sibling charts/ directory. Defensive against subagent prompt-adherence drift — Task #99 prompt guidance instructs canonical ../charts/<name>.png but if a subagent drifts to bare filename, this catches and fixes the reference before downstream conversion fails. Implementation walks up the directory tree 4 levels max looking for a sibling charts/ directory. Only rewrites when existsSync() confirms the file is actually present in charts/. Multiple image formats supported (png, jpg, jpeg, gif, webp, svg). Runs FIRST in the normalizeForPandoc transformation pipeline (before stripVerificationTags, footnote conversion, etc.) so downstream transforms see the corrected paths. FIX 3 — src/server/promptEnhancer.js (+2 LOC defensive) The promptEnhancer.js calls at L360, L364 invoke convertToPdf/convertToDocx WITHOUT resourcePath. All other production callers (convertSession, convertSessionToDocuments, /api/convert/* routes) pass it correctly. This caller is the ONE unsafe site — enhancement-generated markdown typically doesn't contain chart references but defensive fix ensures we never silently break chart rendering in any conversion path. Both Promise.all calls now pass { resourcePath: fullSessionPath } — matches the documentConverter.js:751-752 batch flow pattern. ARCHITECTURAL COMPLETENESS QUARTET With these three fixes shipped, the chart-path convention is end-to-end: 1. Bridge writes to canonical <session>/charts/ (codeExecutionBridge.js:342, pre-existing) 2. Prompt tells subagent to reference as ../charts/<name>.png (_promptConstants.js Step 2.1, shipped in Task #99) 3. PDF/DOCX rendering honors the reference (documentConverter.js cwd fix, THIS COMMIT) 4. Self-healing rewrite if subagent drifts to bare filename (markdownNormalizer.js fixBareChartRefs, THIS COMMIT) Layers 1+2 without 3+4 would mean reports render correctly in raw markdown viewers but break in PDF/DOCX generation. Layers 3+4 close the rendering loop. VERIFICATION - node --check clean on all 3 modified files - Empirical baseline: 2026-05-25 canary PDF (77,596 bytes) with 8 chart refs - Layer 1 unit tests + Layer 2 integration tests + Layer 3 smoke shipping in companion commits (B + C) OUT OF SCOPE (deferred to companion commits) - Observability metrics (Prometheus counters) — Commit B (Task #108) - Test coverage (Layer 1/2/3 pyramid) — Commit C (Task #109) - CI pandoc/typst install — separate infrastructure PR Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…malization + chart_conversion_duration metrics (Task #108) Adds two Prometheus metrics to monitor the chart-rendering pipeline post-merge, closing the observability gap identified by the chart-path- rendering-completeness plan. METRIC 1 — chart_path_normalization_total (Counter) Labels: { status: 'rewritten' | 'no_op', reason: 'bare_ref_self_healed' | 'no_bare_refs' } Fires from: markdownNormalizer.js normalizeForPandoc after fixBareChartRefs Bounded cardinality: ~5 series total Purpose: detect subagent drift from canonical ../charts/<name>.png pattern despite Task #99 prompt guidance. Non-zero `rewritten` count indicates subagents are writing bare filenames and the self-healing transform is catching/fixing them. Production canary baseline (2026-05-25-1779733982) showed count: 0 — subagents follow the canonical convention correctly. Sustained non-zero in production = signal to investigate prompt drift. METRIC 2 — chart_conversion_duration_ms (Histogram) Labels: { format: 'pdf' | 'docx', status: 'ok' | 'error' } Buckets: [100, 500, 1000, 2000, 5000, 10000, 30000] ms Fires from: documentConverter.js convertToDocx + convertToPdf finish path Bounded cardinality: 2 formats × ~3 statuses = 6 series Purpose: track pandoc PDF/DOCX latency. Baseline expectation: 1-5s for small docs, 5-30s for chart-heavy reports. Tail-latency alerts surface infrastructure issues (typst container slowdown, large fixture growth, etc.). EMIT SITES - markdownNormalizer.js:619-629 — after fixBareChartRefs; also logs `[normalizer] fixBareChartRefs rewrote N bare chart ref(s)` to console when count > 0 (operator visibility for drift) - documentConverter.js:522 (convertToDocx finish) — emits both recordDocumentConversion (pre-existing) and recordChartConversion (new) - documentConverter.js:593 (convertToPdf finish) — same dual emit pattern DEFENSIVE All emit sites wrapped in try/catch so metrics failures never break conversion. Module-load smoke verified all 4 new exports resolve correctly. OUT OF SCOPE - Tests for the metrics themselves — Commit C (Task #109) tests the underlying functions; metric emission is fire-and-forget side effect - Prometheus dashboard panel — separate infrastructure task Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…G (Task #109) Completes the chart-path-rendering completeness bundle with comprehensive test coverage at three layers (Layer 4 canary already empirically validated). EXPORT fixBareChartRefs from src/utils/markdownNormalizer.js Function was previously module-internal. Now exported for direct Layer 1 unit testing. Consumer (normalizeForPandoc at L616) unchanged. LAYER 1 — Unit tests (test/sdk/utils/markdownNormalizer-fixBareChartRefs.test.js) 12 tests covering: - No-op (zero bare refs) - Preserve (bare ref → non-existent file) - Rewrite (bare ref → existing file) - Walk-up (deeply nested markdown finds sibling charts/) - Mixed (canonical + bare in same doc, only bare rewritten) - Already-prefixed (charts/Z.png → no double-prefix) - Absolute path no-op (/tmp/chart.png) - HTTPS URL no-op - Special chars in alt text [Title with ! and ()] - Multi-format (png/jpg/jpeg/gif/webp/svg) - Empty markdown sanity - No charts/ directory anywhere → no-op Runtime: <130ms total. Isolated tmpdir per test; full cleanup in afterEach. LAYER 2 — Integration tests (test/sdk/utils/documentConverter-resourcePath.test.js) 4 tests covering: - PDF with canonical ../charts/<name>.png + resourcePath → PDF embeds charts - DOCX with canonical refs + resourcePath → DOCX embeds charts - PDF with bare ref through normalize+resourcePath → fixBareChartRefs rewrites, then resourcePath cwd resolves; PDF embeds chart - PDF without resourcePath option → cwd override skipped, baseline preserved Skip-gated on pandoc+typst availability (matches existing document-conversion.test.js skip pattern). Runtime: ~1s when deps present. LAYER 3 — Smoke test (test/sdk/wrappedSubagents/_smoke-chart-conversion-fixture.mjs) End-to-end fixture conversion validates the full chart-rendering pipeline. 10 assertions covering: fixture setup, PDF generation, PDF size > 5KB (charts embedded), DOCX generation, DOCX size > 3KB, bareChartRefsFixed count = 1, normalized .pandoc.md has correct rewrites + preserves canonical refs. $0 cost (no API calls). Runtime: ~320ms. Pass 10/10 with valid PNG fixtures. FIXTURES - test/sdk/fixtures/chart-bearing-sample.md (smoke fixture: 2 canonical + 1 bare chart ref + metadata + section structure) - test/sdk/fixtures/charts/chart_0[1-3].png (3 minimal valid 16×16 PNGs, 82 bytes each, generated via Node + zlib with proper CRC32) LAYER 4 — Canary (operator-driven) Empirical baseline already validates: reports/2026-05-25-1779733982/ generated 77,596-byte PDF + 16,497-byte DOCX with 8/8 chart references rendered correctly. Post-merge canary re-runs canonical PLTR + verifies chart_path_normalization + chart_conversion_duration metrics populate. CHANGELOG ENTRY Comprehensive [Unreleased] entry under "Phase 4.13 chart-path-rendering-completeness (Tasks #107-#109)" documents the 3-commit bundle, architectural completeness quartet, verification metrics (972 tests pass), and risk register. FULL REGRESSION - 972 wrapped-subagents + hooks + config + server + utils tests pass (956 baseline + 12 Layer 1 + 4 Layer 2 = +16 new) - 2 pre-existing Task #67 failures unchanged - Smoke test passes 10/10 in <500ms Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Number531 merged commit 8b0debc into main May 9, 2026

This was referenced May 9, 2026

feat(exa): A/B sampling logic — quality-lift measurement (v7.6.0) #110

Merged

docs(skills): A3 inheritance in api-integration + subagent-scaffold templates #111

Merged

docs(runbook): Exa A3 staging A/B run protocol #112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(exa): A3 coverage expansion to 5 high-value tools (v7.5.1)#109

feat(exa): A3 coverage expansion to 5 high-value tools (v7.5.1)#109
Number531 merged 1 commit into
mainfrom
claude/exa-a3-coverage-expansion-pr112

Number531 commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Number531 commented May 9, 2026

Summary

Tools added (15 → 20)

Augmentor pattern in action

A3 coverage impact

Test results

Test plan

Out of scope

Predecessors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant