feat(exa): A3 coverage expansion to 5 high-value tools (v7.5.1)#109
Merged
Conversation
First-use of the augmentor pipeline (PR #108) for coverage extension. Adds A3 additionalQueries plumbing to 5 high-traffic legal-research tools. Tools added: - lookup_citation (domain:case-law) - search_judges (domain:judges — NEW axis menu) - search_sec_filings_fulltext (domain:securities) - search_federal_register_notices (domain:federal-register) - search_fda_warning_letters (domain:fda-warning-letter — NEW axis menu) Per-tool effort dropped from ~80 LoC pre-refactor to ~19 LoC per tool (trait declaration + WebSearchClient destructure + spread). Existing domains reused from augmentor's DOMAIN_DESCRIPTIONS; only 2 new axis menus authored (judges, fda-warning-letter). A3 coverage: 15 tools → 20 tools (~30% population increase). Per-memo coverage estimated ~75-80% (was ~65-70%). Tests: - 64/64 augmentor snapshot tests (was 49, +15 new) - 214/214 cumulative Exa-suite tests - 20/20 live API verification shapes accepted (was 15) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 9, 2026
Number531
added a commit
that referenced
this pull request
May 10, 2026
…112) Top-level CHANGELOG was missing the Exa A3 follow-up wave shipped between 2026-05-08 and 2026-05-09. Adds a comprehensive [Unreleased] section above the v7.1.0 entry covering: - v7.2.0 (PR #107) — orchestrator-authored variations through exa_web_search; shared validator extracted; Track A audit reversal documented - v7.3.0 → v7.5.0 (PR #108) — per-domain plumbing for 4 tools, schema rewrite with Jaccard distinctness telemetry, LLM adoption test harness (44% real vs. 100% isolated finding), augmentor refactor (~80 LoC → ~19 LoC per tool) - v7.5.1 (PR #109) — coverage expansion 15 → 20 tools using the augmentor - v7.6.0 (PR #110) — EXA_ADDITIONAL_QUERIES_AB_SAMPLE flag for quality-lift measurement, 4 outcome metrics per arm, 7 unit tests - PR #111 — api-integration + subagent-scaffold templates auto-inherit A3 - PR #112 — exa-a3-ab-staging.md runbook (440 lines, decision tree, failure modes, metrics reference) Cumulative: A3-enabled tools 0 → 20, Exa-suite tests 130 → 221, live API shapes 5 → 20, +2 flags, +6 metrics, +3 shared modules. All flag-gated; zero production behavior change until operator opts-in via staging A/B. Per-project CHANGELOG already documents each version individually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Number531
added a commit
that referenced
this pull request
May 25, 2026
…xBareChartRefs + promptEnhancer defensive (Task #107) Three coupled source fixes that complete the chart-path rendering pipeline introduced by Phase 4.13 v1.6-polish Task #99 (chart path guidance in _promptConstants.js). Without these, the canonical ../charts/<name>.png reference convention would break in PDF/DOCX generation. EMPIRICALLY VALIDATED by 2026-05-25 PLTR canary (session 2026-05-25-1779733982) which generated a 77,596-byte research-plan.pdf with 8/8 chart references rendered correctly. FIX 1 — src/utils/documentConverter.js (+19 LOC) Add cwd: options.resourcePath to pandoc execFileAsync calls in convertToDocx + convertToPdf. Pandoc's --resource-path flag IS honored by the native pandoc writers (incl. DOCX) but NOT by the typst PDF backend — typst resolves image paths relative to its own working directory, not pandoc's. Without cwd override, every chart-bearing PDF fails with "file not found (searched at /app/charts/chart_xxx.png)". The cwd override is conditional (only when options.resourcePath is provided), preserving backward compatibility with callers that don't pass it. The DOCX path mirrors the PDF fix defensively (DOCX writer honors --resource-path natively but cwd override keeps both paths consistent and survives writer changes). FIX 2 — src/utils/markdownNormalizer.js (+48 LOC) Add fixBareChartRefs self-healing transform that detects bare references like  and rewrites to  when the file exists in a sibling charts/ directory. Defensive against subagent prompt-adherence drift — Task #99 prompt guidance instructs canonical ../charts/<name>.png but if a subagent drifts to bare filename, this catches and fixes the reference before downstream conversion fails. Implementation walks up the directory tree 4 levels max looking for a sibling charts/ directory. Only rewrites when existsSync() confirms the file is actually present in charts/. Multiple image formats supported (png, jpg, jpeg, gif, webp, svg). Runs FIRST in the normalizeForPandoc transformation pipeline (before stripVerificationTags, footnote conversion, etc.) so downstream transforms see the corrected paths. FIX 3 — src/server/promptEnhancer.js (+2 LOC defensive) The promptEnhancer.js calls at L360, L364 invoke convertToPdf/convertToDocx WITHOUT resourcePath. All other production callers (convertSession, convertSessionToDocuments, /api/convert/* routes) pass it correctly. This caller is the ONE unsafe site — enhancement-generated markdown typically doesn't contain chart references but defensive fix ensures we never silently break chart rendering in any conversion path. Both Promise.all calls now pass { resourcePath: fullSessionPath } — matches the documentConverter.js:751-752 batch flow pattern. ARCHITECTURAL COMPLETENESS QUARTET With these three fixes shipped, the chart-path convention is end-to-end: 1. Bridge writes to canonical <session>/charts/ (codeExecutionBridge.js:342, pre-existing) 2. Prompt tells subagent to reference as ../charts/<name>.png (_promptConstants.js Step 2.1, shipped in Task #99) 3. PDF/DOCX rendering honors the reference (documentConverter.js cwd fix, THIS COMMIT) 4. Self-healing rewrite if subagent drifts to bare filename (markdownNormalizer.js fixBareChartRefs, THIS COMMIT) Layers 1+2 without 3+4 would mean reports render correctly in raw markdown viewers but break in PDF/DOCX generation. Layers 3+4 close the rendering loop. VERIFICATION - node --check clean on all 3 modified files - Empirical baseline: 2026-05-25 canary PDF (77,596 bytes) with 8 chart refs - Layer 1 unit tests + Layer 2 integration tests + Layer 3 smoke shipping in companion commits (B + C) OUT OF SCOPE (deferred to companion commits) - Observability metrics (Prometheus counters) — Commit B (Task #108) - Test coverage (Layer 1/2/3 pyramid) — Commit C (Task #109) - CI pandoc/typst install — separate infrastructure PR Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Number531
added a commit
that referenced
this pull request
May 25, 2026
…malization + chart_conversion_duration metrics (Task #108) Adds two Prometheus metrics to monitor the chart-rendering pipeline post-merge, closing the observability gap identified by the chart-path- rendering-completeness plan. METRIC 1 — chart_path_normalization_total (Counter) Labels: { status: 'rewritten' | 'no_op', reason: 'bare_ref_self_healed' | 'no_bare_refs' } Fires from: markdownNormalizer.js normalizeForPandoc after fixBareChartRefs Bounded cardinality: ~5 series total Purpose: detect subagent drift from canonical ../charts/<name>.png pattern despite Task #99 prompt guidance. Non-zero `rewritten` count indicates subagents are writing bare filenames and the self-healing transform is catching/fixing them. Production canary baseline (2026-05-25-1779733982) showed count: 0 — subagents follow the canonical convention correctly. Sustained non-zero in production = signal to investigate prompt drift. METRIC 2 — chart_conversion_duration_ms (Histogram) Labels: { format: 'pdf' | 'docx', status: 'ok' | 'error' } Buckets: [100, 500, 1000, 2000, 5000, 10000, 30000] ms Fires from: documentConverter.js convertToDocx + convertToPdf finish path Bounded cardinality: 2 formats × ~3 statuses = 6 series Purpose: track pandoc PDF/DOCX latency. Baseline expectation: 1-5s for small docs, 5-30s for chart-heavy reports. Tail-latency alerts surface infrastructure issues (typst container slowdown, large fixture growth, etc.). EMIT SITES - markdownNormalizer.js:619-629 — after fixBareChartRefs; also logs `[normalizer] fixBareChartRefs rewrote N bare chart ref(s)` to console when count > 0 (operator visibility for drift) - documentConverter.js:522 (convertToDocx finish) — emits both recordDocumentConversion (pre-existing) and recordChartConversion (new) - documentConverter.js:593 (convertToPdf finish) — same dual emit pattern DEFENSIVE All emit sites wrapped in try/catch so metrics failures never break conversion. Module-load smoke verified all 4 new exports resolve correctly. OUT OF SCOPE - Tests for the metrics themselves — Commit C (Task #109) tests the underlying functions; metric emission is fire-and-forget side effect - Prometheus dashboard panel — separate infrastructure task Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Number531
added a commit
that referenced
this pull request
May 25, 2026
…G (Task #109) Completes the chart-path-rendering completeness bundle with comprehensive test coverage at three layers (Layer 4 canary already empirically validated). EXPORT fixBareChartRefs from src/utils/markdownNormalizer.js Function was previously module-internal. Now exported for direct Layer 1 unit testing. Consumer (normalizeForPandoc at L616) unchanged. LAYER 1 — Unit tests (test/sdk/utils/markdownNormalizer-fixBareChartRefs.test.js) 12 tests covering: - No-op (zero bare refs) - Preserve (bare ref → non-existent file) - Rewrite (bare ref → existing file) - Walk-up (deeply nested markdown finds sibling charts/) - Mixed (canonical + bare in same doc, only bare rewritten) - Already-prefixed (charts/Z.png → no double-prefix) - Absolute path no-op (/tmp/chart.png) - HTTPS URL no-op - Special chars in alt text [Title with ! and ()] - Multi-format (png/jpg/jpeg/gif/webp/svg) - Empty markdown sanity - No charts/ directory anywhere → no-op Runtime: <130ms total. Isolated tmpdir per test; full cleanup in afterEach. LAYER 2 — Integration tests (test/sdk/utils/documentConverter-resourcePath.test.js) 4 tests covering: - PDF with canonical ../charts/<name>.png + resourcePath → PDF embeds charts - DOCX with canonical refs + resourcePath → DOCX embeds charts - PDF with bare ref through normalize+resourcePath → fixBareChartRefs rewrites, then resourcePath cwd resolves; PDF embeds chart - PDF without resourcePath option → cwd override skipped, baseline preserved Skip-gated on pandoc+typst availability (matches existing document-conversion.test.js skip pattern). Runtime: ~1s when deps present. LAYER 3 — Smoke test (test/sdk/wrappedSubagents/_smoke-chart-conversion-fixture.mjs) End-to-end fixture conversion validates the full chart-rendering pipeline. 10 assertions covering: fixture setup, PDF generation, PDF size > 5KB (charts embedded), DOCX generation, DOCX size > 3KB, bareChartRefsFixed count = 1, normalized .pandoc.md has correct rewrites + preserves canonical refs. $0 cost (no API calls). Runtime: ~320ms. Pass 10/10 with valid PNG fixtures. FIXTURES - test/sdk/fixtures/chart-bearing-sample.md (smoke fixture: 2 canonical + 1 bare chart ref + metadata + section structure) - test/sdk/fixtures/charts/chart_0[1-3].png (3 minimal valid 16×16 PNGs, 82 bytes each, generated via Node + zlib with proper CRC32) LAYER 4 — Canary (operator-driven) Empirical baseline already validates: reports/2026-05-25-1779733982/ generated 77,596-byte PDF + 16,497-byte DOCX with 8/8 chart references rendered correctly. Post-merge canary re-runs canonical PLTR + verifies chart_path_normalization + chart_conversion_duration metrics populate. CHANGELOG ENTRY Comprehensive [Unreleased] entry under "Phase 4.13 chart-path-rendering-completeness (Tasks #107-#109)" documents the 3-commit bundle, architectural completeness quartet, verification metrics (972 tests pass), and risk register. FULL REGRESSION - 972 wrapped-subagents + hooks + config + server + utils tests pass (956 baseline + 12 Layer 1 + 4 Layer 2 = +16 new) - 2 pre-existing Task #67 failures unchanged - Smoke test passes 10/10 in <500ms Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First use of the augmentor pipeline (PR #108) for coverage extension. Adds A3
additionalQueriesplumbing to 5 high-traffic legal-research tools, demonstrating the per-tool cost has dropped from ~80 LoC to ~19 LoC.Tools added (15 → 20)
lookup_citationcase-lawsearch_judgesjudges(new axis menu)search_sec_filings_fulltextsecuritiessearch_federal_register_noticesfederal-registersearch_fda_warning_lettersfda-warning-letter(new axis menu)Augmentor pattern in action
Per-tool changes uniform across all 5 tools (uniform pattern enabled by PR #108):
traits: ["exa-routable", "domain:X"]declaration intoolDefinitions.js— 1 line per toolexecuteExaSearchoptions — ~3 lines per methodcase-law,securities,federal-register) reuse augmentor'sDOMAIN_DESCRIPTIONSfrom PR feat(exa): A3 Phase A — full pipeline (v7.3.0 → v7.5.0 augmentor refactor) #108 — 0 new axis menusjudges,fda-warning-letter) add fresh axis-menu entries — ~10 lines eachTotal: ~97 LoC for 5 tools = ~19 LoC per tool (vs ~80 LoC pre-refactor — 4× reduction).
A3 coverage impact
Test results
requiredarray order preservedtraitsfrom output (no MCP wire-format leak)Test plan
Out of scope
Predecessors
🤖 Generated with Claude Code