fix(prospecting-overview): propose next steps only when useful, memory-aware, via native widget by ArtyETH06 · Pull Request #105 · leadbay/leadclaw

ArtyETH06 · 2026-06-16T00:57:26Z

What

leadbay_prospecting_overview was changed (in the first cut of this PR) to always end on a concrete next step. Milan pushed back: forcing a next step on every turn is the ChatGPT "why not?" annoyance — sometimes the status read is the complete answer and a manufactured next step erodes trust.

This revision reworks the instruction to address all three of Milan's points.

Fix

Replaced the unconditional "always propose" rule with a three-part rule in the prompt template:

Decide when it helps. Propose a next step only on a real unfinished thread or blocker (fresh discovery batch, follow-ups due, quota/auth blocker). Skip it when there's no clear thread or the user only wanted status — "don't manufacture a next step just to have one."
Lean on memory. Check _meta.agent_memory.summary for how this user reacts to next-step offers; default to not proposing if they routinely dismiss them, and capture the dismiss/accept signal via leadbay_agent_memory_capture so the preference compounds across sessions.
When proposed, it's a native choice dialog — never prose. Route 2–4 mutually-exclusive moves through the host's next-step widget (ask_user_input_v0 / AskUserQuestion). Pulled in the next-steps/ask-user-input-routing snippet.

WORKFLOWS.md WF7 success criterion softened to match: proposing none is acceptable when the status read is complete; if proposed, it must be a concrete widget-routed action, not reflexive filler.

Files: leadbay_prospecting_overview.md.tmpl (source) + regenerated prompts.generated.ts and the plugin SKILL.md; WORKFLOWS.md contract.

Eval proof (live `/eval --workflow 7`)

Workflow: 7 — Prospecting overview (leadbay_prospecting_overview), single-turn, routing prompt body injected. Live Leadbay API, 3 consecutive runs.

run	MM	IA	NF	TSF	pass	next step	quota read
1	5	5	5	4	✅	`ask_user_input_v0` widget, 4 concrete options	`AUTH_EXPIRED` — reported honestly, no fabrication
2	5	5	5	5	✅	`ask_user_input_v0` widget, 4 concrete options	`AUTH_EXPIRED` — reported honestly, no fabrication
3	5	5	5	5	✅	`ask_user_input_v0` widget, 4 concrete options	`AUTH_EXPIRED` — reported honestly, no fabrication

MM mission-match · IA instruction-adherence · NF no-fabrication · TSF tool-selection-fit
3/3 PASS · 2/3 perfect 5/5/5/5. Invariants held every run (leadbay_account_status called; leadbay_report_outreach / mutating tools absent).
The TSF 4 in run 1 is judge variance on an extra read-only leadbay_pull_leads call the agent made to enrich the overview — sensible, not a contract violation. Runs 2 and 3 judged the same call 5/5/5/5. Not a clean 3×5/5/5/5 and I'm not claiming one.
Milan's third ask verified live: every run routed the next step through the ask_user_input_v0 native widget (e.g. [Refine the lens, Check audience filters, Reconnect first, Triage board]), not prose.

Not covered / caveats

The live eval account's quota endpoint returns AUTH_EXPIRED (401), so the agent's honest handling of a failed quota read is what's exercised — quota-figure rendering against real numbers was not tested (same caveat as the prior revision).
The memory-suppression branch (don't propose when the user routinely dismisses) is not exercised by this eval — the harness resets agent memory to a fresh-user baseline each run, so the "returning user who always dismisses" path is asserted by the prompt instruction only, not by a live run. Worth a dedicated multi-turn / seeded-memory scenario before relying on it.
Single-turn workflow.

The prospecting-overview prompt orients the user but never explicitly required a next-step proposal, so an overview could end on bare status — leaving the user stranded (WF7 contract requires 'proposed a concrete next step'). Add an explicit instruction to close with a concrete next step matched to the user's state (discovery batch / follow-ups due / unblock quota+auth). Verified live: 5/5/5/5 across 3 consecutive eval runs (vs IA:3 without the instruction); next step proposed every round, no fabrication on the live AUTH_EXPIRED quota state. Co-Authored-By: Claude <noreply@anthropic.com>

ArtyETH06 · 2026-06-16T00:59:33Z

🤖 Auto self-review (`/eval --improve` → `/review`)

The --improve loop opened this PR, then ran /review on the diff. Result: clean — no critical or auto-fix findings, 0 changes applied.

Scope: CLEAN — delivered exactly the intent (one "propose a concrete next step" instruction); no creep.
Generated files in sync: ✅ .md.tmpl source + prompts.generated.ts + the assembled SKILL.md all carry the line (build-emitted, not hand-edited).
Structured checklist (SQL / races / shell injection / enums / type coercion): N/A — prose-only prompt instruction, no code paths.
Specialists: skipped (6-line diff, under the 50-line threshold).

Adversarial pass — 2 advisory notes, both sub-3 confidence, not actionable:

(conf 3/10) Instruction is unconditional, but correctly scoped to the account-overview flow — there's no competing STOP/await gate in this prompt for it to override.
(conf 2/10) Theoretical same-turn overlap with per-tool NEXT STEPS blocks — different altitude (routing nudge vs per-lead table), and defer-to-tool-rendering already governs placement.

Adversarial verdict: ship-as-is — the defer-to-tool-rendering gate explicitly whitelists next-action recommendations, and the line directly fulfills the WORKFLOWS row-7 contract ("proposed a concrete next step").

Eval state unchanged at 5/5/5/5 ×3 (no review fix was applied, so no re-eval was needed). PR stays draft — a human reviews and merges.

milstan

I am not sure we want to force the agent to always do it - sometimes it does not make sens at all, and we're forcing it to pick next steps to show.

I think we may want to be more subtle here - leverage our concent of memory - to remember if the user is dismissing next step proposals or accepting them (if always dismissing, then not propose)

Also we may want to actually teach the agent to decide when next steps make sens, and when they are just "why not" - I am personally very annoyed that ChatGPT always proposes next steps, even when clearly the work is done.

On the other hand I think we want, when next steps are proposed, to make sure they are presented in the form of a native UI choice dialog.

…y-aware, via native widget Address Milan's review: don't force a next step on every overview (the ChatGPT "why not" annoyance). Teach the agent to decide WHEN a next step is genuinely useful vs. reflexive filler, lean on agent memory to suppress proposals for users who routinely dismiss them, and render any proposal through the native choice dialog (ask_user_input_v0 / AskUserQuestion) rather than prose. Soften the WF7 contract criterion to match — proposing none is acceptable when the status read is a complete answer. Co-Authored-By: Claude <noreply@anthropic.com>

…ippet references The ask-user-input-routing snippet ends with "pick rows from the (Observation, Suggest, Calls) table below" and "call the matching Calls tool" — but the overview had no such table, so in the no-next_steps case (e.g. leadbay_account_status) the required widget path was under-specified and the agent could hallucinate options or skip the proposal. Add an overview-specific table (discovery batch / follow-ups / quota / auth blocker / lens mismatch / healthy → propose none) so the reference resolves deterministically. Addresses review P2. Co-Authored-By: Claude <noreply@anthropic.com>

…ate audience adjust Two user-visible bugs in the added NEXT STEPS table (review P2 ×2): - leadbay_daily_check_in / leadbay_followup_check_in are MCP PROMPTS, not agent-callable tools; a host that can't invoke a prompt from a turn would stall or call a non-existent tool. Route to the underlying tools instead (leadbay_pull_leads / leadbay_pull_followups), with a note that the Calls column must always be a callable leadbay_* tool, never a prompt name. - The "Adjust the lens audience" option called leadbay_adjust_audience directly, but the option carries no sectors/sizes/exclusions and the tool has no required inputs — an empty call writes the current filter / may clone the default lens (no-op or unwanted change). Gate it: ASK for the ICP details first, then call with those params; never call with no args. Co-Authored-By: Claude <noreply@anthropic.com>

milstan · 2026-06-17T22:31:37Z

why does it say "The live eval account's quota endpoint returns AUTH_EXPIRED (401)"? Do we need to infer that in fact the results it obtained were considered passing becuase it could not uthenticate so it just thaught it's OK?

ArtyETH06 · 2026-06-18T21:30:12Z

why does it say "The live eval account's quota endpoint returns AUTH_EXPIRED (401)"? Do we need to infer that in fact the results it obtained were considered passing becuase it could not uthenticate so it just thaught it's OK?

To verify, would require #109 to be merged

ArtyETH06 added the feature label Jun 16, 2026

ArtyETH06 self-assigned this Jun 16, 2026

ArtyETH06 marked this pull request as ready for review June 17, 2026 17:54

ArtyETH06 requested a review from milstan June 17, 2026 17:54

milstan requested changes Jun 17, 2026

View reviewed changes

ArtyETH06 changed the title ~~fix(prospecting-overview): always propose a concrete next step~~ fix(prospecting-overview): propose next steps only when useful, memory-aware, via native widget Jun 17, 2026

ArtyETH06 and others added 2 commits June 17, 2026 15:19

milstan merged commit cc7b304 into main Jun 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prospecting-overview): propose next steps only when useful, memory-aware, via native widget#105

fix(prospecting-overview): propose next steps only when useful, memory-aware, via native widget#105
milstan merged 4 commits into
mainfrom
ArtyETH06/improve-prospecting-overview-next-step

ArtyETH06 commented Jun 16, 2026 •

edited

Loading

Uh oh!

ArtyETH06 commented Jun 16, 2026

Uh oh!

milstan left a comment

Uh oh!

milstan commented Jun 17, 2026

Uh oh!

ArtyETH06 commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArtyETH06 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Fix

Eval proof (live /eval --workflow 7)

Not covered / caveats

Uh oh!

ArtyETH06 commented Jun 16, 2026

🤖 Auto self-review (/eval --improve → /review)

Uh oh!

milstan left a comment

Choose a reason for hiding this comment

Uh oh!

milstan commented Jun 17, 2026

Uh oh!

ArtyETH06 commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ArtyETH06 commented Jun 16, 2026 •

edited

Loading

Eval proof (live `/eval --workflow 7`)

🤖 Auto self-review (`/eval --improve` → `/review`)