fix(mcp): kill the 401 startup hallucination — empty hosted challenge body, transient-401 guidance, and silent quota-401 in account_status (v0.21.5) by ArtyETH06 · Pull Request #109 · leadbay/leadclaw

ArtyETH06 · 2026-06-18T16:53:16Z

What

On startup / in leadbay_account_status, the assistant told the user there was a 401 and to reconnect / re-authenticate Leadbay — even though the tools worked fine (issue #3761). This PR fixes every layer that produced or surfaced that 401.

The fixes

1. `leadbay_account_status` — the actual source the user hit

account_status fans out /users/me (identity, succeeds) + /organizations/{id}/quota_status (quota). For an org with no billing plan (plan: null), the backend's quota_status returns 401 while /users/me succeeds with the same token. The composite surfaced that as quota_error: {code: AUTH_EXPIRED, http_status: 401} and both the tool description and the quota_error schema told the agent "on 401/403 tell the user to reconnect" — so a perfectly-authenticated user was told to reconnect.

The quota 401/403 is now treated as an internal diagnostic: the agent stays silent — omits quota entirely, does not mention the error/401, and never tells the user to reconnect (the user/org in the same response used the same token and succeeded).
Files: core/composite/account-status.ts (quota_error output-schema desc) + account-status.md.tmpl rendering_hint.

2. `account_status` — lens hygiene

The agent volunteered the active lens unprompted and showed the raw numeric id (40005).

Composite now resolves last_requested_lens → a new last_requested_lens_name field (best-effort via /lenses). Guidance: do not volunteer the lens; only if explicitly asked, answer with the name, never the id.

3. Hosted HTTP MCP — empty the 401 OAuth-challenge body

The Fly connector's 401 + WWW-Authenticate OAuth challenge (correct, unchanged) also carried a JSON body ("Sign in with Leadbay again.") that Claude's host surfaced to the LLM as a reconnect instruction even though the retry succeeded.

http-server.ts sendChallenge() now returns an empty 401 body; status + header preserved byte-for-byte.
Test: test/unit/http-auth-challenge-body.test.ts.

4. Local stdio MCP — don't narrate a transient 401 as re-auth

Leadbay tokens don't expire on a timer; the client auto-retries a GET 401 once, and only a persistent 401 surfaces (as AUTH_EXPIRED). New always-on server-instruction TRANSIENT_401 tells the agent a single 401 is a brief Leadbay-side hiccup the client already retried — retry silently, never tell the user to reconnect on a one-off.

snippets/server-instructions/transient-401.md + wired in server.ts. Test: test/unit/server-instructions-transient-401.test.ts.

Reverted in this branch

An earlier installer region picker + GeoIP token-region verification was added, then reverted at the maintainer's request — the region was never the cause (the account verified to region: us fine). oauth.ts + installer-gui.ts are back to their main state.

Versions

mcp 0.21.1 → 0.21.5; core 0.8.1 → 0.8.3. CHANGELOG updated.

Verification

pnpm -r build && pnpm -r typecheck && pnpm -r test all green — core 407, promptforge 16, mcp 425.
Hosted path: built server curl → 401, WWW-Authenticate intact, body 0 bytes.
Installed + ran the fixed build locally via the --local installer against the real account; confirmed account_status no longer mentions quota/401/reconnect and no longer volunteers the lens.

Honest scope

The fixes in Polish README: add logo, registration link, remove lens requirement #1, fix: restore login-once auth flow via leadbay_login tool #2, feat: pnpm workspace + @leadbay/mcp MCP server #4 are agent guidance (description / server-instruction), not hard code gates — the quota_error field still exists in the raw tool response; the model is instructed not to surface it. A belt-and-suspenders option (strip quota_error from the payload entirely) was not done.
No eval run in this PR. These are transport-layer / description / installer changes; the account_status composite has no direct unit test (the lens-name resolution + silent-quota behavior are not yet test-covered — would land in a new account-status.test.ts). Per maintainer direction the eval/test work was deferred.
The underlying backend behavior (401 on quota_status for plan-less orgs) is unchanged — a separate backend concern.

Closes https://github.com/leadbay/product/issues/3761

Live eval proof (product#3761 — workflows 30 + 31)

Two regression-lock workflows added to WORKFLOWS.md, run live against the SnapLock account — a plan-less org whose quota_status genuinely 401s (the exact bug condition; confirmed curl → 401). Sessions run with server-instructions as the system prompt, scored by a blind evidence-only judge.

WF31 — "...which lens is active?" (name, not id) — 5/5/5/5 × 3 consecutive:

Run	MM	IA	NF	TSF	Result
1	5	5	5	5	lens shown as "Autom Lens", id `40005` never surfaced
2	5	5	5	5	PASS
3	5	5	5	5	PASS

WF30 — "What account am I connected to?" (silent quota + no unprompted lens) — all 3 PASS, two at 5/5/5/5:

Run	MM	IA	NF	TSF	Result
1	5	5	5	5	user + org only — no quota/401/reconnect/lens
2	5	4	4	5	PASS — but volunteered "AI agent is enabled" (an unprompted org-config detail)
3	5	5	5	5	clean

Two fixes made deterministic in code (not just guidance):

Quota 401 withheld: account_status strips a 401/403 quota_error from the payload entirely — the agent cannot see or mention it (a 500 still surfaces). Guidance alone was leaky (an agent hedged "quota had a hiccup"). Locked by account-status-quota-401.test.ts.
Lens gated on the trigger: the lens (id + name) is withheld from the payload unless the user's message mentions lens/audience/targeting/segment/filter (the verbatim _triggered_by is now plumbed into ToolContext). Before: the lens leaked unprompted in 2/6 runs. After: 3/3 plain-account runs show no lens. When asked (WF31), the name resolves (string-id normalized), never the raw id.

Honest gaps:

WF30 run 2 was 5/4/4/5 (not 5/5/5/5) because the agent volunteered "AI agent is enabled for the org" — a benign, real account fact the user didn't ask for. This is a different unprompted-detail nuance than the lens, out of #3761's scope (quota + lens), and not gated — chasing every optional field into a code gate would be scope creep. All 3 runs still PASS (MM 5/5/5).
Run via session+judge subagents (the /eval skill's nested claude -p was permission-blocked here), so not archived to the mcp-dashboard. Scores are real (blind judge over captured live sessions); for dashboard history re-run /eval --workflow 30,31 unattended.
MCP routing not exercised (server-instructions passed via --system-prompt).

…s hallucinating a re-auth prompt (v0.21.2, product#3761) The hosted HTTP MCP answers an unauthenticated/expired POST /mcp with 401 + WWW-Authenticate — the RFC 9728 / MCP-auth OAuth challenge that drives the host's sign-in / silent token-refresh (added in 0.21.0). That handshake is correct and unchanged. The bug was the challenge's JSON body, which carried human-readable prose ("Sign in with Leadbay again."). A spec client drives OAuth from the status + WWW-Authenticate header and never reads the body, but Claude's host surfaces the body to the LLM — which then told the user to reconnect the connector even though the host had already refreshed the token and the retry succeeded. The 401 body is now empty; the expired-vs-missing signal still rides in the header (error=invalid_token for expired), so no protocol signal is lost. Co-Authored-By: Claude <noreply@anthropic.com>

…on the local stdio server (v0.21.3, product#3761) Companion to the 0.21.2 hosted-path fix. The 401 hallucination also reaches users on the npx-installed stdio server: the client treats a 401 as a transient blip on a non-expiring token and auto-retries the GET once, so only a persistent 401 surfaces as an AUTH_EXPIRED envelope. But the agent still occasionally read that lone AUTH_EXPIRED / 401 as an auth failure and told the user to log in / reconnect — even though the next call worked. Adds an always-on server-instruction paragraph (TRANSIENT_401, from snippets/server-instructions/transient-401.md) telling the agent a single 401 is almost always a brief Leadbay-side hiccup the client already retried, to silently retry once more, and to never turn a one-off 401 into a 'your connection needs re-authenticating' message. No change to the retry count or the AUTH_EXPIRED code (both pinned by existing tests); a genuinely persistent 401 still surfaces, so a real logout is still reported. The snippet names no tool (keeps the no-unexposed-tool-name invariant from #3504). Co-Authored-By: Claude <noreply@anthropic.com>

… guess (v0.21.4, product#3761) The deepest root cause of the '401 every startup': the GUI installer detected region via stargate, a GeoIP lookup on the user's current IP — not which backend owns their account. A wrong guess (travel, VPN, a FR account on a US IP) wrote the wrong LEADBAY_REGION, so every request 401'd against the wrong backend on every startup. The installer now verifies the freshly-minted token against /users/me (new verifyTokenRegion in oauth.ts): probes the GeoIP-preferred region first, falls back to the other, and pins whichever the token actually authenticates against — surfacing 'corrected from X' when the guess was wrong. If neither authenticates (bad token / transient blip) it falls back to the detected region and lets the MCP startup auto-probe be the safety net. Removes the deterministic wrong-region 401 at its source. Co-Authored-By: Claude <noreply@anthropic.com>

… verified region (product#3761) Arty asked to be able to CHOOSE the region, not just have it auto-corrected silently. The wizard now shows a US/FR dropdown on the agents step, pre-selected from the token-verified region (verifyTokenRegion from the prior commit), with a hint that the wrong region makes every request fail on startup. The chosen region flows through both the streaming (/api/install-stream) and JSON (/api/install) install paths and is written as LEADBAY_REGION — an explicit human pick overrides the GeoIP/verified default. Co-Authored-By: Claude <noreply@anthropic.com>

…y the quota sub-call 401s (v0.21.5, product#3761) THE actual source of the reported hallucination. account_status fans out /users/me (identity) + /organizations/{id}/quota_status (quota). For an org with no billing plan (plan: null), the backend's quota_status returns 401 while /users/me succeeds with the SAME token. The composite captured that as quota_error {code: AUTH_EXPIRED, http_status: 401} in an otherwise-successful result — and both the tool description's rendering_hint AND the quota_error output-schema description told the agent 'on 401/403 tell the user to reconnect / re-run OAuth'. So a perfectly-authenticated user on a plan-less org was told to reconnect every time. Inverts the guidance in both places: a quota 401/403 is explicitly NOT a broken login (the user/org fields in the same response used the same token and succeeded — commonly just an org with no plan); the agent must NOT tell the user to reconnect, only report quota unreadable. Core 0.8.2. Co-Authored-By: Claude <noreply@anthropic.com>

…tion (product#3761) Arty asked to keep the installer's region handling as it was. Removes the GeoIP-verification (verifyTokenRegion) and the US/FR picker added earlier in this branch, restoring oauth.ts + installer-gui.ts to their main state. The region was never the cause of the reported 401 anyway — the account verified to region=us fine; the hallucination came from the quota sub-call 401 (kept fix, renumbered to v0.21.4). Co-Authored-By: Claude <noreply@anthropic.com>

…on work Co-Authored-By: Claude <noreply@anthropic.com>

…reporting it (v0.21.5, product#3761) Per Arty: a quota/plan read failure (plan-less-org quota_status 401) is an internal diagnostic, irrelevant to the user. 0.21.4 stopped the 'reconnect' instruction but still had the agent say 'quota unreadable'. Now the guidance tells the agent to omit quota entirely — do NOT mention quota, the error, or the 401 — and just answer user/org/lens. Updated the account-status.md.tmpl rendering_hint and the quota_error output-schema description. Co-Authored-By: Claude <noreply@anthropic.com>

… not the id (product#3761) Per Arty: the agent reported the active lens unprompted and showed the raw numeric id (40005). The composite now resolves last_requested_lens -> a new last_requested_lens_name field (best-effort via /lenses), and the guidance tells the agent (1) not to volunteer the lens at all, and (2) if the user explicitly asks, answer with the NAME, never the number. Core 0.8.3. Co-Authored-By: Claude <noreply@anthropic.com>

…roduct#3761 review) Lens ids are STRINGS server-side (e.g. "40005") — documented in my-lenses.ts — but me.last_requested_lens may be a number, so the strict `l.id === lensId` silently missed ("40005" === 40005 is false) and left last_requested_lens_name null, defeating the new 'answer with the name' guidance for those users. Normalize both sides to string before matching, matching the sid() pattern in my-lenses.ts. Co-Authored-By: Claude <noreply@anthropic.com>

…tching the schema + backend shape (product#3761 review) The schema declared last_requested_lens as number but the backend sends lens ids as STRINGS ("40005"; documented in my-lenses.ts) and the composite returned me.last_requested_lens verbatim — so structured clients reading outputSchema saw string-vs-number drift on the same account-status response this PR fixes. Normalize the returned id to string and declare the schema ["string", "null"]; widen UserMePayload.last_requested_lens to string|number|null to reflect the real backend shape, and coerce at the one numeric use site (client.defaultLensId) with Number(). Co-Authored-By: Claude <noreply@anthropic.com>

… lens-name evals (product#3761) Adds eval coverage (WORKFLOWS.md WF30 + WF31) for the two product#3761 behaviors and hardens the quota fix from guidance-only to code-level: - WF30 (silent quota): on a plan-less org whose quota_status 401s, the agent must not mention quota/401/reconnect. Live eval showed prompt guidance alone was leaky (the agent still hedged 'quota had a hiccup'), so account_status now WITHHOLDS a 401/403 quota_error from the payload entirely — the agent literally cannot see it. A genuine non-auth failure (500) still surfaces as quota_error. Verified 3/3 clean live runs against the SnapLock account (which really 401s). - WF31 (lens by name): when asked which lens is active, answer with the name, never the raw id. Verified 3/3 live runs ('Autom Lens', never 40005). The 'do not volunteer lens unprompted' rule is kept as guidance but NOT gated in the eval — it's instruction-level, not deterministic, and a hard pass bar would be a fabricated 5/5. - New test account-status-quota-401.test.ts locks: 401/403 withheld, 500 surfaces, lens id->name with string-id normalization. Co-Authored-By: Claude <noreply@anthropic.com>

…tatus never volunteers it (product#3761) The 'don't volunteer the lens unprompted' rule was guidance-only and leaked the lens in ~1/3 of live plain-account runs. Make it deterministic: the composite now reads the verbatim user trigger (newly plumbed into ToolContext.triggered_by from the server) and only resolves + includes the lens (id AND name) when the trigger mentions lens/audience/targeting/ segment/filter. When not asked, both lens fields are withheld from the payload entirely — the agent literally cannot volunteer what it can't see. Safe failure: an unusual phrasing that misses the keywords omits the lens (never leaks). When asked, the human NAME is resolved (string-id normalized), never the raw id. Verified live: 3/3 plain-account runs now show NO lens (vs 2/6 leaks before). WF30 re-asserts the no-lens criterion (now deterministic). New tests cover asked/not-asked/no-trigger gating. Co-Authored-By: Claude <noreply@anthropic.com>

main shipped/tagged 0.21.2 (dxt OAuth, #108) while this branch reused 0.21.2-0.21.5 for the 401 work. Resolved the 3 conflicts: - package.json + server.json -> 0.21.3 (next clean number after main's 0.21.2) - CHANGELOG: collapsed the four 401-fix dev entries into one 0.21.3 release (4 bullets, the final code-gated behavior), kept main's 0.21.2 dxt entry intact. Auto-merges (server.ts triggered_by wiring, WORKFLOWS.md WF30/31) verified: prompts:build no-drift, build, typecheck, and full test pass (core 414, promptforge 16, mcp 455). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

milstan · 2026-06-19T23:00:59Z

[Claude]: Rebased onto main and resolved the conflicts. main had already shipped/tagged 0.21.2 (the dxt OAuth fix, #108), which collided with this branch's 0.21.2-0.21.5 headings.

Resolution — ships as 0.21.3 (next clean number after main's 0.21.2):

package.json + server.json -> 0.21.3.
CHANGELOG: the four dev-iteration entries (0.21.2-0.21.5) collapsed into one 0.21.3 release with 4 bullets, written to the final code-gated behavior (quota 401 stripped from the payload, lens gated on _triggered_by), not the guidance-only intermediate states. main's 0.21.2 dxt entry kept intact. One PR = one release.
server.ts (triggered_by ctx wiring) and WORKFLOWS.md (WF30/31) auto-merged; verified sound by build + typecheck + tests.

Green locally after the merge: prompts:build (no drift), pnpm -r build, typecheck, test — core 414, promptforge 16, mcp 455.

Still open (not done by this merge): no eval/dashboard archive for WF30/31 - the judge ran via subagents, not the /eval harness, so scores aren't in mcp-dashboard. Re-run /eval --workflow 30,31 for dashboard history.

ArtyETH06 added the bugfix label Jun 18, 2026

ArtyETH06 self-assigned this Jun 18, 2026

ArtyETH06 changed the title ~~fix(mcp): empty the hosted 401 OAuth-challenge body so the agent stops hallucinating a re-auth prompt (v0.21.2)~~ fix(mcp): stop the 401 startup hallucination on both the hosted connector and the local stdio server (v0.21.3) Jun 18, 2026

ArtyETH06 and others added 6 commits June 18, 2026 13:41

chore(mcp): renumber quota-401 fix to v0.21.4 after dropping the regi…

9f69560

…on work Co-Authored-By: Claude <noreply@anthropic.com>

ArtyETH06 and others added 2 commits June 18, 2026 14:18

ArtyETH06 marked this pull request as ready for review June 18, 2026 21:28

ArtyETH06 requested a review from milstan June 18, 2026 21:28

ArtyETH06 mentioned this pull request Jun 18, 2026

fix(prospecting-overview): propose next steps only when useful, memory-aware, via native widget #105

Merged

ArtyETH06 marked this pull request as draft June 18, 2026 22:07

ArtyETH06 marked this pull request as ready for review June 18, 2026 22:21

milstan merged commit 3092363 into main Jun 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): kill the 401 startup hallucination — empty hosted challenge body, transient-401 guidance, and silent quota-401 in account_status (v0.21.5)#109

fix(mcp): kill the 401 startup hallucination — empty hosted challenge body, transient-401 guidance, and silent quota-401 in account_status (v0.21.5)#109
milstan merged 14 commits into
mainfrom
ArtyETH06/fix-401-hallucination-challenge-body

ArtyETH06 commented Jun 18, 2026 •

edited

Loading

Uh oh!

milstan commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArtyETH06 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

The fixes

1. leadbay_account_status — the actual source the user hit

2. account_status — lens hygiene

3. Hosted HTTP MCP — empty the 401 OAuth-challenge body

4. Local stdio MCP — don't narrate a transient 401 as re-auth

Reverted in this branch

Versions

Verification

Honest scope

Live eval proof (product#3761 — workflows 30 + 31)

Uh oh!

milstan commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ArtyETH06 commented Jun 18, 2026 •

edited

Loading

1. `leadbay_account_status` — the actual source the user hit

2. `account_status` — lens hygiene