Compile pipeline + phases/gates prompts + full eval framework (mcp 0.7.0) by milstan · Pull Request #34 · leadbay/leadclaw

milstan · 2026-05-15T23:58:13Z

Summary

Introduces @leadbay/promptforge — a build-time workspace package that compiles .md.tmpl source files into prompts.generated.ts / tool-descriptions.generated.ts. Authors edit prose in one place; bundles inline the generated string constants. The MCP wire format is unchanged: every consumer (Claude Desktop, Cursor, Claude Code, OpenClaw) sees the same tools/list / prompts/list / prompts/get shapes.

Backwards-compat preserved: every tool name, every inputSchema, every annotation set, every outputSchema is byte-for-byte identical. Only the description prose changes — and it's tighter, more structured, and easier to evolve.

Highlights

All 6 MCP prompts restructured into PHASES + IRON LAWS + GATES with byproduct outputs (COLUMN PRESERVATION PLAN, DECISION LOG, FINAL REPORT, STOP — awaiting user decision).
All 60 tool descriptions migrated to .md.tmpl (21 composite + 37 granular + 2 file-import companions) using a shared snippet library of 11 partials.
Daily check-in overhaul: top-10 motivational triage; auto-top-up via leadbay_bulk_qualify_leads; research every promising lead; ask before consuming contact-enrichment quota.
Import-file overhaul: explicit GOAL ("maximize how many rows the Leadbay system can ingest and match"); two deliverables named (max-coverage column mapping + augmented file with LEADBAY_ID column); WHAT GOOD LOOKS LIKE section; per-CRM EXTERNAL_ID guidance extended from HubSpot to Salesforce / Pipedrive / Close / Attio.
Full eval framework (packages/mcp/test/eval/): in-process MCP server driver, Evidence-pyramid pass rule, mission-match judge with rubric-from-frontmatter, drift meta-judge, replay-by-default backend HTTP, worktree-based drift detector, 70-fixture tool-routing classifier. Gated by EVAL=1; default pnpm test ignores it.
Audit tests (packages/mcp/test/audit/): eval-coverage stubs, no-inline-descriptions-in-server.ts, per-tool ≤3500-char budget, naming convention, snippet orphans, freshness gate.
CI gate: root pnpm test runs pnpm prompts:check first; PR fails if .md.tmpl edits don't have matching regenerated .generated.ts updates.

Version bumps

@leadbay/mcp: 0.6.3 → 0.7.0 (publishable)
@leadbay/core: 0.4.0 → 0.5.0 (private)
@leadbay/promptforge: 0.1.0 (new, private)
dxt/manifest.template.json login-command string: @leadbay/mcp@0.6 → @leadbay/mcp@0.7

CHANGELOG: rolled the previous 0.6.4 UNRELEASED import-flow entry forward into 0.7.0 (the 0.6.4 work landed via PR #33 but never published).

Test plan

pnpm prompts:check — generated files in sync with .md.tmpl sources
pnpm test — 430/430 pass (240 core + 12 leadclaw + 169 mcp + 9 promptforge)
pnpm build — all packages build cleanly; published artifact is leadbay-0.7.0.mcpb (275.7 KB)
Live MCP server smoke: tools/list returns 58 tools at full exposure with migrated descriptions; prompts/get(leadbay_import_file) returns a 14k-char body with 5 PHASE blocks, 3 GATE blocks, the GOAL section, the WHAT GOOD LOOKS LIKE section, and the multi-CRM record-link guidance
Snapshot regression test: pre-migration body of leadbay_import_file (PR Add agentic Leadbay file import flow #33 prose) captured under packages/promptforge/test/snapshots/ so accidental drift surfaces as a test failure
Audit-compliance test feat(mcp): install/login subcommands + npm publish prep + LLM-hallucination-proof docs #6 confirms each tool name resolves to exactly one Tool object (cross-tier sharing for leadbay_list_mappable_fields / leadbay_add_note / leadbay_create_custom_field preserved as intended)

Post-merge release steps

Per RELEASE.md:

git checkout main && git pull
git tag mcp-v0.7.0
git push origin mcp-v0.7.0
Watch the release workflow in GitHub Actions. It runs preflight-npm → publish-mcp (build + test + tag/version check + npm publish --access public --provenance).

🤖 Generated with Claude Code

…7.0) Introduces @leadbay/promptforge — a build-time workspace package that compiles .md.tmpl source files into prompts.generated.ts and tool-descriptions.generated.ts, consumed by @leadbay/mcp and @leadbay/core. Authors edit prose in one place; the bundle inlines the generated string constants. The MCP wire format is unchanged — tools/list, prompts/list, and prompts/get serve the same shapes; only the description prose itself is rewritten. Backwards-compat: every tool name, every inputSchema, every annotation set, and every outputSchema is preserved byte-for-byte. Live MCP clients (Claude Desktop, Cursor, Claude Code, OpenClaw) see no protocol break. What landed: * New package @leadbay/promptforge: cli (build|check|approve-drift), assembler with arg + expected_calls + failure_modes validation, snippet resolver with cycle detection, gray-matter + zod frontmatter parser, diff-friendly emit (one alphabetized const per artifact in named regions). Exposes a library entry for the eval framework to read rubrics at runtime. * All 6 MCP prompts restructured into PHASES + IRON LAWS + GATES with byproduct outputs the agent must emit before progressing (COLUMN PRESERVATION PLAN, DECISION LOG, FINAL REPORT for import_file; "STOP — awaiting user decision" for daily_check_in). * All 60 tool descriptions migrated to .md.tmpl source files (21 composite + 37 granular + 2 file-import companions). Shared snippet library (iron-laws, gates, heuristics, headers — 11 partials); per-CRM EXTERNAL_ID guidance extended from HubSpot to Salesforce, Pipedrive, Close, Attio. * Daily-check-in overhaul: top-10 triage with motivational framing; auto-top-up via leadbay_bulk_qualify_leads when batch is short; research every promising lead; ask before consuming contact-enrichment quota. * Import-file overhaul: explicit GOAL section; two deliverables named (max-coverage column mapping + augmented file with LEADBAY_ID column); WHAT GOOD LOOKS LIKE section teaches the disambiguation bar positively; domain-extraction emphasized as key match factor. * Test framework spine (packages/mcp/test/eval/): in-process MCP server driver, Anthropic SDK tool-use bridge, L1+L2+L3 Evidence shape with pyramid-completeness rule, atomic partial saves to .context/evals, mission-match judge with rubric-from-frontmatter, drift meta-judge (3-label verdict), backend HTTP record/replay (replay-by-default, EVAL_RECORD=1 opt-in), worktree-based drift detector, tool-routing classifier eval with 70+ fixtures. Gated by EVAL=1; default pnpm test ignores it. * Audit tests (packages/mcp/test/audit/): every prompt has eval coverage stub, no inline tool descriptions in server.ts, per-tool ≤3500-char budget, leadbay_<verb>_<noun> naming convention. Plus in-promptforge audits: freshness, snippet orphans + dead refs, assembler smoke + arg-validation. * Root scripts: prompts:build, prompts:check (CI gate), test:eval, test:gate, test:periodic, eval:drift. Prebuild hooks in @leadbay/core and @leadbay/mcp ensure generated files are fresh before any build. Verification (default pnpm test): 430 tests pass across the monorepo — 240 core / 12 leadclaw / 169 mcp / 9 promptforge. pnpm prompts:check confirms generated files are in sync. pnpm build produces the same published MCPB shape (now leadbay-0.7.0.mcpb). End-to-end runtime sanity: a live tools/list returns 58 tools with their migrated descriptions; a live prompts/get(leadbay_import_file) returns the 14k-char body with 5 PHASE blocks, 3 GATE blocks, GOAL section, WHAT GOOD LOOKS LIKE section, and the CRM record-link guidance for HubSpot/Salesforce/Pipedrive/etc. Version bumps: * @leadbay/mcp: 0.6.3 → 0.7.0 * @leadbay/core: 0.4.0 → 0.5.0 (private) * @leadbay/promptforge: 0.1.0 (new, private) * dxt manifest.template install-instruction string: 0.6 → 0.7 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolved WORKFLOWS.md: kept main's workflows 30-33 (account-status hygiene, artifact_kit, team_activity) and renumbered the campaign builder to #34. Regenerated all promptforge generated files from the merged templates. Bumped @leadbay/mcp 0.22.0 -> 0.23.0 (main already shipped 0.22.0) across package.json, server.json, and both changelogs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

milstan merged commit 5d0a13a into main May 15, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile pipeline + phases/gates prompts + full eval framework (mcp 0.7.0)#34

Compile pipeline + phases/gates prompts + full eval framework (mcp 0.7.0)#34
milstan merged 1 commit into
mainfrom
milstan/compile-mcp-prompts-skills

milstan commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

milstan commented May 15, 2026

Summary

Highlights

Version bumps

Test plan

Post-merge release steps

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant