Skip to content

feat(exa): A3 coverage expansion to 5 high-value tools (v7.5.1)#109

Merged
Number531 merged 1 commit into
mainfrom
claude/exa-a3-coverage-expansion-pr112
May 9, 2026
Merged

feat(exa): A3 coverage expansion to 5 high-value tools (v7.5.1)#109
Number531 merged 1 commit into
mainfrom
claude/exa-a3-coverage-expansion-pr112

Conversation

@Number531

Copy link
Copy Markdown
Owner

Summary

First use of the augmentor pipeline (PR #108) for coverage extension. Adds A3 additionalQueries plumbing to 5 high-traffic legal-research tools, demonstrating the per-tool cost has dropped from ~80 LoC to ~19 LoC.

Tools added (15 → 20)

Tool Domain Use case
lookup_citation case-law Cite-checking — present in nearly every memo
search_judges judges (new axis menu) Judicial conflict-of-interest analysis
search_sec_filings_fulltext securities EDGAR EFTS Boolean search
search_federal_register_notices federal-register Sunshine Act / agency announcement tracking
search_fda_warning_letters fda-warning-letter (new axis menu) Pharma diligence regulatory compliance

Augmentor pattern in action

Per-tool changes uniform across all 5 tools (uniform pattern enabled by PR #108):

  1. traits: ["exa-routable", "domain:X"] declaration in toolDefinitions.js1 line per tool
  2. WebSearchClient method update: destructure + spread to executeExaSearch options — ~3 lines per method
  3. Existing domains (case-law, securities, federal-register) reuse augmentor's DOMAIN_DESCRIPTIONS from PR feat(exa): A3 Phase A — full pipeline (v7.3.0 → v7.5.0 augmentor refactor) #1080 new axis menus
  4. New domains (judges, fda-warning-letter) add fresh axis-menu entries — ~10 lines each

Total: ~97 LoC for 5 tools = ~19 LoC per tool (vs ~80 LoC pre-refactor — 4× reduction).

A3 coverage impact

  • Before this PR: 15 tools (~65–70% of memo tool calls)
  • After this PR: 20 tools (~75–80% of memo tool calls)
  • Increases A/B test population by ~30% for upcoming staging memo runs

Test results

  • 64/64 augmentor snapshot tests (was 49, +15 new — byte-equivalence + ordering for new tools)
  • 214/214 cumulative Exa-suite tests (was 199, +15)
  • 20/20 live API verification shapes accepted by Exa (was 15)
  • ✅ Property ordering invariant preserved (additionalQueries last)
  • required array order preserved
  • ✅ Augmentor strips traits from output (no MCP wire-format leak)

Test plan

Out of scope

Predecessors

🤖 Generated with Claude Code

First-use of the augmentor pipeline (PR #108) for coverage extension.
Adds A3 additionalQueries plumbing to 5 high-traffic legal-research
tools.

Tools added:
- lookup_citation (domain:case-law)
- search_judges (domain:judges — NEW axis menu)
- search_sec_filings_fulltext (domain:securities)
- search_federal_register_notices (domain:federal-register)
- search_fda_warning_letters (domain:fda-warning-letter — NEW axis menu)

Per-tool effort dropped from ~80 LoC pre-refactor to ~19 LoC per tool
(trait declaration + WebSearchClient destructure + spread). Existing
domains reused from augmentor's DOMAIN_DESCRIPTIONS; only 2 new axis
menus authored (judges, fda-warning-letter).

A3 coverage: 15 tools → 20 tools (~30% population increase).
Per-memo coverage estimated ~75-80% (was ~65-70%).

Tests:
- 64/64 augmentor snapshot tests (was 49, +15 new)
- 214/214 cumulative Exa-suite tests
- 20/20 live API verification shapes accepted (was 15)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Number531 Number531 merged commit 8b0debc into main May 9, 2026
Number531 added a commit that referenced this pull request May 10, 2026
…112)

Top-level CHANGELOG was missing the Exa A3 follow-up wave shipped between
2026-05-08 and 2026-05-09. Adds a comprehensive [Unreleased] section above
the v7.1.0 entry covering:

- v7.2.0 (PR #107) — orchestrator-authored variations through exa_web_search;
  shared validator extracted; Track A audit reversal documented
- v7.3.0 → v7.5.0 (PR #108) — per-domain plumbing for 4 tools, schema rewrite
  with Jaccard distinctness telemetry, LLM adoption test harness (44% real
  vs. 100% isolated finding), augmentor refactor (~80 LoC → ~19 LoC per tool)
- v7.5.1 (PR #109) — coverage expansion 15 → 20 tools using the augmentor
- v7.6.0 (PR #110) — EXA_ADDITIONAL_QUERIES_AB_SAMPLE flag for quality-lift
  measurement, 4 outcome metrics per arm, 7 unit tests
- PR #111 — api-integration + subagent-scaffold templates auto-inherit A3
- PR #112 — exa-a3-ab-staging.md runbook (440 lines, decision tree,
  failure modes, metrics reference)

Cumulative: A3-enabled tools 0 → 20, Exa-suite tests 130 → 221, live API
shapes 5 → 20, +2 flags, +6 metrics, +3 shared modules. All flag-gated;
zero production behavior change until operator opts-in via staging A/B.

Per-project CHANGELOG already documents each version individually.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Number531 added a commit that referenced this pull request May 25, 2026
…xBareChartRefs + promptEnhancer defensive (Task #107)

Three coupled source fixes that complete the chart-path rendering pipeline
introduced by Phase 4.13 v1.6-polish Task #99 (chart path guidance in
_promptConstants.js). Without these, the canonical ../charts/<name>.png
reference convention would break in PDF/DOCX generation.

EMPIRICALLY VALIDATED by 2026-05-25 PLTR canary (session
2026-05-25-1779733982) which generated a 77,596-byte research-plan.pdf with
8/8 chart references rendered correctly.

FIX 1 — src/utils/documentConverter.js (+19 LOC)
Add cwd: options.resourcePath to pandoc execFileAsync calls in convertToDocx
+ convertToPdf. Pandoc's --resource-path flag IS honored by the native
pandoc writers (incl. DOCX) but NOT by the typst PDF backend — typst
resolves image paths relative to its own working directory, not pandoc's.
Without cwd override, every chart-bearing PDF fails with
"file not found (searched at /app/charts/chart_xxx.png)".

The cwd override is conditional (only when options.resourcePath is provided),
preserving backward compatibility with callers that don't pass it. The
DOCX path mirrors the PDF fix defensively (DOCX writer honors --resource-path
natively but cwd override keeps both paths consistent and survives writer
changes).

FIX 2 — src/utils/markdownNormalizer.js (+48 LOC)
Add fixBareChartRefs self-healing transform that detects bare references
like ![chart](pltr.png) and rewrites to ![chart](charts/pltr.png) when the
file exists in a sibling charts/ directory. Defensive against subagent
prompt-adherence drift — Task #99 prompt guidance instructs canonical
../charts/<name>.png but if a subagent drifts to bare filename, this
catches and fixes the reference before downstream conversion fails.

Implementation walks up the directory tree 4 levels max looking for a
sibling charts/ directory. Only rewrites when existsSync() confirms the
file is actually present in charts/. Multiple image formats supported
(png, jpg, jpeg, gif, webp, svg).

Runs FIRST in the normalizeForPandoc transformation pipeline (before
stripVerificationTags, footnote conversion, etc.) so downstream transforms
see the corrected paths.

FIX 3 — src/server/promptEnhancer.js (+2 LOC defensive)
The promptEnhancer.js calls at L360, L364 invoke convertToPdf/convertToDocx
WITHOUT resourcePath. All other production callers
(convertSession, convertSessionToDocuments, /api/convert/* routes) pass it
correctly. This caller is the ONE unsafe site — enhancement-generated
markdown typically doesn't contain chart references but defensive fix
ensures we never silently break chart rendering in any conversion path.

Both Promise.all calls now pass { resourcePath: fullSessionPath } — matches
the documentConverter.js:751-752 batch flow pattern.

ARCHITECTURAL COMPLETENESS QUARTET
With these three fixes shipped, the chart-path convention is end-to-end:
1. Bridge writes to canonical <session>/charts/ (codeExecutionBridge.js:342,
   pre-existing)
2. Prompt tells subagent to reference as ../charts/<name>.png
   (_promptConstants.js Step 2.1, shipped in Task #99)
3. PDF/DOCX rendering honors the reference (documentConverter.js cwd fix,
   THIS COMMIT)
4. Self-healing rewrite if subagent drifts to bare filename
   (markdownNormalizer.js fixBareChartRefs, THIS COMMIT)

Layers 1+2 without 3+4 would mean reports render correctly in raw markdown
viewers but break in PDF/DOCX generation. Layers 3+4 close the rendering
loop.

VERIFICATION
- node --check clean on all 3 modified files
- Empirical baseline: 2026-05-25 canary PDF (77,596 bytes) with 8 chart refs
- Layer 1 unit tests + Layer 2 integration tests + Layer 3 smoke shipping
  in companion commits (B + C)

OUT OF SCOPE (deferred to companion commits)
- Observability metrics (Prometheus counters) — Commit B (Task #108)
- Test coverage (Layer 1/2/3 pyramid) — Commit C (Task #109)
- CI pandoc/typst install — separate infrastructure PR

Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Number531 added a commit that referenced this pull request May 25, 2026
…malization + chart_conversion_duration metrics (Task #108)

Adds two Prometheus metrics to monitor the chart-rendering pipeline
post-merge, closing the observability gap identified by the chart-path-
rendering-completeness plan.

METRIC 1 — chart_path_normalization_total (Counter)
Labels: { status: 'rewritten' | 'no_op', reason: 'bare_ref_self_healed' |
'no_bare_refs' }
Fires from: markdownNormalizer.js normalizeForPandoc after fixBareChartRefs
Bounded cardinality: ~5 series total

Purpose: detect subagent drift from canonical ../charts/<name>.png pattern
despite Task #99 prompt guidance. Non-zero `rewritten` count indicates
subagents are writing bare filenames and the self-healing transform is
catching/fixing them. Production canary baseline (2026-05-25-1779733982)
showed count: 0 — subagents follow the canonical convention correctly.
Sustained non-zero in production = signal to investigate prompt drift.

METRIC 2 — chart_conversion_duration_ms (Histogram)
Labels: { format: 'pdf' | 'docx', status: 'ok' | 'error' }
Buckets: [100, 500, 1000, 2000, 5000, 10000, 30000] ms
Fires from: documentConverter.js convertToDocx + convertToPdf finish path
Bounded cardinality: 2 formats × ~3 statuses = 6 series

Purpose: track pandoc PDF/DOCX latency. Baseline expectation: 1-5s for
small docs, 5-30s for chart-heavy reports. Tail-latency alerts surface
infrastructure issues (typst container slowdown, large fixture growth,
etc.).

EMIT SITES
- markdownNormalizer.js:619-629 — after fixBareChartRefs; also logs
  `[normalizer] fixBareChartRefs rewrote N bare chart ref(s)` to console
  when count > 0 (operator visibility for drift)
- documentConverter.js:522 (convertToDocx finish) — emits both
  recordDocumentConversion (pre-existing) and recordChartConversion (new)
- documentConverter.js:593 (convertToPdf finish) — same dual emit pattern

DEFENSIVE
All emit sites wrapped in try/catch so metrics failures never break
conversion. Module-load smoke verified all 4 new exports resolve correctly.

OUT OF SCOPE
- Tests for the metrics themselves — Commit C (Task #109) tests the
  underlying functions; metric emission is fire-and-forget side effect
- Prometheus dashboard panel — separate infrastructure task

Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Number531 added a commit that referenced this pull request May 25, 2026
…G (Task #109)

Completes the chart-path-rendering completeness bundle with comprehensive
test coverage at three layers (Layer 4 canary already empirically validated).

EXPORT fixBareChartRefs from src/utils/markdownNormalizer.js
Function was previously module-internal. Now exported for direct Layer 1
unit testing. Consumer (normalizeForPandoc at L616) unchanged.

LAYER 1 — Unit tests (test/sdk/utils/markdownNormalizer-fixBareChartRefs.test.js)
12 tests covering:
- No-op (zero bare refs)
- Preserve (bare ref → non-existent file)
- Rewrite (bare ref → existing file)
- Walk-up (deeply nested markdown finds sibling charts/)
- Mixed (canonical + bare in same doc, only bare rewritten)
- Already-prefixed (charts/Z.png → no double-prefix)
- Absolute path no-op (/tmp/chart.png)
- HTTPS URL no-op
- Special chars in alt text [Title with ! and ()]
- Multi-format (png/jpg/jpeg/gif/webp/svg)
- Empty markdown sanity
- No charts/ directory anywhere → no-op
Runtime: <130ms total. Isolated tmpdir per test; full cleanup in afterEach.

LAYER 2 — Integration tests (test/sdk/utils/documentConverter-resourcePath.test.js)
4 tests covering:
- PDF with canonical ../charts/<name>.png + resourcePath → PDF embeds charts
- DOCX with canonical refs + resourcePath → DOCX embeds charts
- PDF with bare ref through normalize+resourcePath → fixBareChartRefs rewrites,
  then resourcePath cwd resolves; PDF embeds chart
- PDF without resourcePath option → cwd override skipped, baseline preserved
Skip-gated on pandoc+typst availability (matches existing
document-conversion.test.js skip pattern). Runtime: ~1s when deps present.

LAYER 3 — Smoke test (test/sdk/wrappedSubagents/_smoke-chart-conversion-fixture.mjs)
End-to-end fixture conversion validates the full chart-rendering pipeline.
10 assertions covering: fixture setup, PDF generation, PDF size > 5KB
(charts embedded), DOCX generation, DOCX size > 3KB, bareChartRefsFixed
count = 1, normalized .pandoc.md has correct rewrites + preserves canonical
refs.
$0 cost (no API calls). Runtime: ~320ms. Pass 10/10 with valid PNG fixtures.

FIXTURES
- test/sdk/fixtures/chart-bearing-sample.md (smoke fixture: 2 canonical +
  1 bare chart ref + metadata + section structure)
- test/sdk/fixtures/charts/chart_0[1-3].png (3 minimal valid 16×16 PNGs,
  82 bytes each, generated via Node + zlib with proper CRC32)

LAYER 4 — Canary (operator-driven)
Empirical baseline already validates: reports/2026-05-25-1779733982/
generated 77,596-byte PDF + 16,497-byte DOCX with 8/8 chart references
rendered correctly. Post-merge canary re-runs canonical PLTR + verifies
chart_path_normalization + chart_conversion_duration metrics populate.

CHANGELOG ENTRY
Comprehensive [Unreleased] entry under
"Phase 4.13 chart-path-rendering-completeness (Tasks #107-#109)" documents
the 3-commit bundle, architectural completeness quartet, verification
metrics (972 tests pass), and risk register.

FULL REGRESSION
- 972 wrapped-subagents + hooks + config + server + utils tests pass
  (956 baseline + 12 Layer 1 + 4 Layer 2 = +16 new)
- 2 pre-existing Task #67 failures unchanged
- Smoke test passes 10/10 in <500ms

Plan: docs/pending-updates/Chart-Path-Rendering-Completeness-Plan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant