Skip to content

Compile pipeline + phases/gates prompts + full eval framework (mcp 0.7.0)#34

Merged
milstan merged 1 commit into
mainfrom
milstan/compile-mcp-prompts-skills
May 15, 2026
Merged

Compile pipeline + phases/gates prompts + full eval framework (mcp 0.7.0)#34
milstan merged 1 commit into
mainfrom
milstan/compile-mcp-prompts-skills

Conversation

@milstan

@milstan milstan commented May 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces @leadbay/promptforge — a build-time workspace package that compiles .md.tmpl source files into prompts.generated.ts / tool-descriptions.generated.ts. Authors edit prose in one place; bundles inline the generated string constants. The MCP wire format is unchanged: every consumer (Claude Desktop, Cursor, Claude Code, OpenClaw) sees the same tools/list / prompts/list / prompts/get shapes.

Backwards-compat preserved: every tool name, every inputSchema, every annotation set, every outputSchema is byte-for-byte identical. Only the description prose changes — and it's tighter, more structured, and easier to evolve.

Highlights

  • All 6 MCP prompts restructured into PHASES + IRON LAWS + GATES with byproduct outputs (COLUMN PRESERVATION PLAN, DECISION LOG, FINAL REPORT, STOP — awaiting user decision).
  • All 60 tool descriptions migrated to .md.tmpl (21 composite + 37 granular + 2 file-import companions) using a shared snippet library of 11 partials.
  • Daily check-in overhaul: top-10 motivational triage; auto-top-up via leadbay_bulk_qualify_leads; research every promising lead; ask before consuming contact-enrichment quota.
  • Import-file overhaul: explicit GOAL ("maximize how many rows the Leadbay system can ingest and match"); two deliverables named (max-coverage column mapping + augmented file with LEADBAY_ID column); WHAT GOOD LOOKS LIKE section; per-CRM EXTERNAL_ID guidance extended from HubSpot to Salesforce / Pipedrive / Close / Attio.
  • Full eval framework (packages/mcp/test/eval/): in-process MCP server driver, Evidence-pyramid pass rule, mission-match judge with rubric-from-frontmatter, drift meta-judge, replay-by-default backend HTTP, worktree-based drift detector, 70-fixture tool-routing classifier. Gated by EVAL=1; default pnpm test ignores it.
  • Audit tests (packages/mcp/test/audit/): eval-coverage stubs, no-inline-descriptions-in-server.ts, per-tool ≤3500-char budget, naming convention, snippet orphans, freshness gate.
  • CI gate: root pnpm test runs pnpm prompts:check first; PR fails if .md.tmpl edits don't have matching regenerated .generated.ts updates.

Version bumps

  • @leadbay/mcp: 0.6.3 → 0.7.0 (publishable)
  • @leadbay/core: 0.4.0 → 0.5.0 (private)
  • @leadbay/promptforge: 0.1.0 (new, private)
  • dxt/manifest.template.json login-command string: @leadbay/mcp@0.6@leadbay/mcp@0.7

CHANGELOG: rolled the previous 0.6.4 UNRELEASED import-flow entry forward into 0.7.0 (the 0.6.4 work landed via PR #33 but never published).

Test plan

  • pnpm prompts:check — generated files in sync with .md.tmpl sources
  • pnpm test430/430 pass (240 core + 12 leadclaw + 169 mcp + 9 promptforge)
  • pnpm build — all packages build cleanly; published artifact is leadbay-0.7.0.mcpb (275.7 KB)
  • Live MCP server smoke: tools/list returns 58 tools at full exposure with migrated descriptions; prompts/get(leadbay_import_file) returns a 14k-char body with 5 PHASE blocks, 3 GATE blocks, the GOAL section, the WHAT GOOD LOOKS LIKE section, and the multi-CRM record-link guidance
  • Snapshot regression test: pre-migration body of leadbay_import_file (PR Add agentic Leadbay file import flow #33 prose) captured under packages/promptforge/test/snapshots/ so accidental drift surfaces as a test failure
  • Audit-compliance test feat(mcp): install/login subcommands + npm publish prep + LLM-hallucination-proof docs #6 confirms each tool name resolves to exactly one Tool object (cross-tier sharing for leadbay_list_mappable_fields / leadbay_add_note / leadbay_create_custom_field preserved as intended)

Post-merge release steps

Per RELEASE.md:

  1. git checkout main && git pull
  2. git tag mcp-v0.7.0
  3. git push origin mcp-v0.7.0
  4. Watch the release workflow in GitHub Actions. It runs preflight-npmpublish-mcp (build + test + tag/version check + npm publish --access public --provenance).

🤖 Generated with Claude Code

…7.0)

Introduces @leadbay/promptforge — a build-time workspace package that
compiles .md.tmpl source files into prompts.generated.ts and
tool-descriptions.generated.ts, consumed by @leadbay/mcp and @leadbay/core.
Authors edit prose in one place; the bundle inlines the generated string
constants. The MCP wire format is unchanged — tools/list, prompts/list,
and prompts/get serve the same shapes; only the description prose itself
is rewritten.

Backwards-compat: every tool name, every inputSchema, every annotation
set, and every outputSchema is preserved byte-for-byte. Live MCP clients
(Claude Desktop, Cursor, Claude Code, OpenClaw) see no protocol break.

What landed:

* New package @leadbay/promptforge: cli (build|check|approve-drift),
  assembler with arg + expected_calls + failure_modes validation,
  snippet resolver with cycle detection, gray-matter + zod frontmatter
  parser, diff-friendly emit (one alphabetized const per artifact in
  named regions). Exposes a library entry for the eval framework to
  read rubrics at runtime.

* All 6 MCP prompts restructured into PHASES + IRON LAWS + GATES with
  byproduct outputs the agent must emit before progressing (COLUMN
  PRESERVATION PLAN, DECISION LOG, FINAL REPORT for import_file;
  "STOP — awaiting user decision" for daily_check_in).

* All 60 tool descriptions migrated to .md.tmpl source files (21
  composite + 37 granular + 2 file-import companions). Shared snippet
  library (iron-laws, gates, heuristics, headers — 11 partials);
  per-CRM EXTERNAL_ID guidance extended from HubSpot to Salesforce,
  Pipedrive, Close, Attio.

* Daily-check-in overhaul: top-10 triage with motivational framing;
  auto-top-up via leadbay_bulk_qualify_leads when batch is short;
  research every promising lead; ask before consuming contact-enrichment
  quota.

* Import-file overhaul: explicit GOAL section; two deliverables named
  (max-coverage column mapping + augmented file with LEADBAY_ID column);
  WHAT GOOD LOOKS LIKE section teaches the disambiguation bar
  positively; domain-extraction emphasized as key match factor.

* Test framework spine (packages/mcp/test/eval/): in-process MCP server
  driver, Anthropic SDK tool-use bridge, L1+L2+L3 Evidence shape with
  pyramid-completeness rule, atomic partial saves to .context/evals,
  mission-match judge with rubric-from-frontmatter, drift meta-judge
  (3-label verdict), backend HTTP record/replay (replay-by-default,
  EVAL_RECORD=1 opt-in), worktree-based drift detector, tool-routing
  classifier eval with 70+ fixtures. Gated by EVAL=1; default pnpm test
  ignores it.

* Audit tests (packages/mcp/test/audit/): every prompt has eval
  coverage stub, no inline tool descriptions in server.ts, per-tool
  ≤3500-char budget, leadbay_<verb>_<noun> naming convention. Plus
  in-promptforge audits: freshness, snippet orphans + dead refs,
  assembler smoke + arg-validation.

* Root scripts: prompts:build, prompts:check (CI gate), test:eval,
  test:gate, test:periodic, eval:drift. Prebuild hooks in @leadbay/core
  and @leadbay/mcp ensure generated files are fresh before any build.

Verification (default pnpm test): 430 tests pass across the monorepo —
240 core / 12 leadclaw / 169 mcp / 9 promptforge. pnpm prompts:check
confirms generated files are in sync. pnpm build produces the same
published MCPB shape (now leadbay-0.7.0.mcpb). End-to-end runtime
sanity: a live tools/list returns 58 tools with their migrated
descriptions; a live prompts/get(leadbay_import_file) returns the
14k-char body with 5 PHASE blocks, 3 GATE blocks, GOAL section,
WHAT GOOD LOOKS LIKE section, and the CRM record-link guidance for
HubSpot/Salesforce/Pipedrive/etc.

Version bumps:
* @leadbay/mcp: 0.6.3 → 0.7.0
* @leadbay/core: 0.4.0 → 0.5.0 (private)
* @leadbay/promptforge: 0.1.0 (new, private)
* dxt manifest.template install-instruction string: 0.6 → 0.7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@milstan milstan merged commit 5d0a13a into main May 15, 2026
1 check passed
milstan added a commit that referenced this pull request Jun 22, 2026
Resolved WORKFLOWS.md: kept main's workflows 30-33 (account-status hygiene,
artifact_kit, team_activity) and renumbered the campaign builder to #34.
Regenerated all promptforge generated files from the merged templates.
Bumped @leadbay/mcp 0.22.0 -> 0.23.0 (main already shipped 0.22.0) across
package.json, server.json, and both changelogs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant