Compile pipeline + phases/gates prompts + full eval framework (mcp 0.7.0)#34
Merged
Merged
Conversation
…7.0) Introduces @leadbay/promptforge — a build-time workspace package that compiles .md.tmpl source files into prompts.generated.ts and tool-descriptions.generated.ts, consumed by @leadbay/mcp and @leadbay/core. Authors edit prose in one place; the bundle inlines the generated string constants. The MCP wire format is unchanged — tools/list, prompts/list, and prompts/get serve the same shapes; only the description prose itself is rewritten. Backwards-compat: every tool name, every inputSchema, every annotation set, and every outputSchema is preserved byte-for-byte. Live MCP clients (Claude Desktop, Cursor, Claude Code, OpenClaw) see no protocol break. What landed: * New package @leadbay/promptforge: cli (build|check|approve-drift), assembler with arg + expected_calls + failure_modes validation, snippet resolver with cycle detection, gray-matter + zod frontmatter parser, diff-friendly emit (one alphabetized const per artifact in named regions). Exposes a library entry for the eval framework to read rubrics at runtime. * All 6 MCP prompts restructured into PHASES + IRON LAWS + GATES with byproduct outputs the agent must emit before progressing (COLUMN PRESERVATION PLAN, DECISION LOG, FINAL REPORT for import_file; "STOP — awaiting user decision" for daily_check_in). * All 60 tool descriptions migrated to .md.tmpl source files (21 composite + 37 granular + 2 file-import companions). Shared snippet library (iron-laws, gates, heuristics, headers — 11 partials); per-CRM EXTERNAL_ID guidance extended from HubSpot to Salesforce, Pipedrive, Close, Attio. * Daily-check-in overhaul: top-10 triage with motivational framing; auto-top-up via leadbay_bulk_qualify_leads when batch is short; research every promising lead; ask before consuming contact-enrichment quota. * Import-file overhaul: explicit GOAL section; two deliverables named (max-coverage column mapping + augmented file with LEADBAY_ID column); WHAT GOOD LOOKS LIKE section teaches the disambiguation bar positively; domain-extraction emphasized as key match factor. * Test framework spine (packages/mcp/test/eval/): in-process MCP server driver, Anthropic SDK tool-use bridge, L1+L2+L3 Evidence shape with pyramid-completeness rule, atomic partial saves to .context/evals, mission-match judge with rubric-from-frontmatter, drift meta-judge (3-label verdict), backend HTTP record/replay (replay-by-default, EVAL_RECORD=1 opt-in), worktree-based drift detector, tool-routing classifier eval with 70+ fixtures. Gated by EVAL=1; default pnpm test ignores it. * Audit tests (packages/mcp/test/audit/): every prompt has eval coverage stub, no inline tool descriptions in server.ts, per-tool ≤3500-char budget, leadbay_<verb>_<noun> naming convention. Plus in-promptforge audits: freshness, snippet orphans + dead refs, assembler smoke + arg-validation. * Root scripts: prompts:build, prompts:check (CI gate), test:eval, test:gate, test:periodic, eval:drift. Prebuild hooks in @leadbay/core and @leadbay/mcp ensure generated files are fresh before any build. Verification (default pnpm test): 430 tests pass across the monorepo — 240 core / 12 leadclaw / 169 mcp / 9 promptforge. pnpm prompts:check confirms generated files are in sync. pnpm build produces the same published MCPB shape (now leadbay-0.7.0.mcpb). End-to-end runtime sanity: a live tools/list returns 58 tools with their migrated descriptions; a live prompts/get(leadbay_import_file) returns the 14k-char body with 5 PHASE blocks, 3 GATE blocks, GOAL section, WHAT GOOD LOOKS LIKE section, and the CRM record-link guidance for HubSpot/Salesforce/Pipedrive/etc. Version bumps: * @leadbay/mcp: 0.6.3 → 0.7.0 * @leadbay/core: 0.4.0 → 0.5.0 (private) * @leadbay/promptforge: 0.1.0 (new, private) * dxt manifest.template install-instruction string: 0.6 → 0.7 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
milstan
added a commit
that referenced
this pull request
Jun 22, 2026
Resolved WORKFLOWS.md: kept main's workflows 30-33 (account-status hygiene, artifact_kit, team_activity) and renumbered the campaign builder to #34. Regenerated all promptforge generated files from the merged templates. Bumped @leadbay/mcp 0.22.0 -> 0.23.0 (main already shipped 0.22.0) across package.json, server.json, and both changelogs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces
@leadbay/promptforge— a build-time workspace package that compiles.md.tmplsource files intoprompts.generated.ts/tool-descriptions.generated.ts. Authors edit prose in one place; bundles inline the generated string constants. The MCP wire format is unchanged: every consumer (Claude Desktop, Cursor, Claude Code, OpenClaw) sees the sametools/list/prompts/list/prompts/getshapes.Backwards-compat preserved: every tool name, every
inputSchema, every annotation set, everyoutputSchemais byte-for-byte identical. Only the description prose changes — and it's tighter, more structured, and easier to evolve.Highlights
COLUMN PRESERVATION PLAN,DECISION LOG,FINAL REPORT,STOP — awaiting user decision)..md.tmpl(21 composite + 37 granular + 2 file-import companions) using a shared snippet library of 11 partials.leadbay_bulk_qualify_leads; research every promising lead; ask before consuming contact-enrichment quota.LEADBAY_IDcolumn); WHAT GOOD LOOKS LIKE section; per-CRMEXTERNAL_IDguidance extended from HubSpot to Salesforce / Pipedrive / Close / Attio.packages/mcp/test/eval/): in-process MCP server driver, Evidence-pyramid pass rule, mission-match judge with rubric-from-frontmatter, drift meta-judge, replay-by-default backend HTTP, worktree-based drift detector, 70-fixture tool-routing classifier. Gated byEVAL=1; defaultpnpm testignores it.packages/mcp/test/audit/): eval-coverage stubs, no-inline-descriptions-in-server.ts, per-tool ≤3500-char budget, naming convention, snippet orphans, freshness gate.pnpm testrunspnpm prompts:checkfirst; PR fails if.md.tmpledits don't have matching regenerated.generated.tsupdates.Version bumps
@leadbay/mcp: 0.6.3 → 0.7.0 (publishable)@leadbay/core: 0.4.0 → 0.5.0 (private)@leadbay/promptforge: 0.1.0 (new, private)dxt/manifest.template.jsonlogin-command string:@leadbay/mcp@0.6→@leadbay/mcp@0.7CHANGELOG: rolled the previous
0.6.4 UNRELEASEDimport-flow entry forward into0.7.0(the 0.6.4 work landed via PR #33 but never published).Test plan
pnpm prompts:check— generated files in sync with.md.tmplsourcespnpm test— 430/430 pass (240 core + 12 leadclaw + 169 mcp + 9 promptforge)pnpm build— all packages build cleanly; published artifact isleadbay-0.7.0.mcpb(275.7 KB)tools/listreturns 58 tools at full exposure with migrated descriptions;prompts/get(leadbay_import_file)returns a 14k-char body with 5 PHASE blocks, 3 GATE blocks, the GOAL section, the WHAT GOOD LOOKS LIKE section, and the multi-CRM record-link guidanceleadbay_import_file(PR Add agentic Leadbay file import flow #33 prose) captured underpackages/promptforge/test/snapshots/so accidental drift surfaces as a test failureToolobject (cross-tier sharing forleadbay_list_mappable_fields/leadbay_add_note/leadbay_create_custom_fieldpreserved as intended)Post-merge release steps
Per
RELEASE.md:git checkout main && git pullgit tag mcp-v0.7.0git push origin mcp-v0.7.0releaseworkflow in GitHub Actions. It runspreflight-npm→publish-mcp(build + test + tag/version check +npm publish --access public --provenance).🤖 Generated with Claude Code