fix(mcp): kill the 401 startup hallucination — empty hosted challenge body, transient-401 guidance, and silent quota-401 in account_status (v0.21.5)#109
Merged
Conversation
…s hallucinating a re-auth prompt (v0.21.2, product#3761)
The hosted HTTP MCP answers an unauthenticated/expired POST /mcp with
401 + WWW-Authenticate — the RFC 9728 / MCP-auth OAuth challenge that
drives the host's sign-in / silent token-refresh (added in 0.21.0). That
handshake is correct and unchanged.
The bug was the challenge's JSON body, which carried human-readable
prose ("Sign in with Leadbay again."). A spec client drives OAuth from
the status + WWW-Authenticate header and never reads the body, but
Claude's host surfaces the body to the LLM — which then told the user to
reconnect the connector even though the host had already refreshed the
token and the retry succeeded. The 401 body is now empty; the
expired-vs-missing signal still rides in the header (error=invalid_token
for expired), so no protocol signal is lost.
Co-Authored-By: Claude <noreply@anthropic.com>
…on the local stdio server (v0.21.3, product#3761) Companion to the 0.21.2 hosted-path fix. The 401 hallucination also reaches users on the npx-installed stdio server: the client treats a 401 as a transient blip on a non-expiring token and auto-retries the GET once, so only a persistent 401 surfaces as an AUTH_EXPIRED envelope. But the agent still occasionally read that lone AUTH_EXPIRED / 401 as an auth failure and told the user to log in / reconnect — even though the next call worked. Adds an always-on server-instruction paragraph (TRANSIENT_401, from snippets/server-instructions/transient-401.md) telling the agent a single 401 is almost always a brief Leadbay-side hiccup the client already retried, to silently retry once more, and to never turn a one-off 401 into a 'your connection needs re-authenticating' message. No change to the retry count or the AUTH_EXPIRED code (both pinned by existing tests); a genuinely persistent 401 still surfaces, so a real logout is still reported. The snippet names no tool (keeps the no-unexposed-tool-name invariant from #3504). Co-Authored-By: Claude <noreply@anthropic.com>
… guess (v0.21.4, product#3761) The deepest root cause of the '401 every startup': the GUI installer detected region via stargate, a GeoIP lookup on the user's current IP — not which backend owns their account. A wrong guess (travel, VPN, a FR account on a US IP) wrote the wrong LEADBAY_REGION, so every request 401'd against the wrong backend on every startup. The installer now verifies the freshly-minted token against /users/me (new verifyTokenRegion in oauth.ts): probes the GeoIP-preferred region first, falls back to the other, and pins whichever the token actually authenticates against — surfacing 'corrected from X' when the guess was wrong. If neither authenticates (bad token / transient blip) it falls back to the detected region and lets the MCP startup auto-probe be the safety net. Removes the deterministic wrong-region 401 at its source. Co-Authored-By: Claude <noreply@anthropic.com>
… verified region (product#3761) Arty asked to be able to CHOOSE the region, not just have it auto-corrected silently. The wizard now shows a US/FR dropdown on the agents step, pre-selected from the token-verified region (verifyTokenRegion from the prior commit), with a hint that the wrong region makes every request fail on startup. The chosen region flows through both the streaming (/api/install-stream) and JSON (/api/install) install paths and is written as LEADBAY_REGION — an explicit human pick overrides the GeoIP/verified default. Co-Authored-By: Claude <noreply@anthropic.com>
…y the quota sub-call 401s (v0.21.5, product#3761)
THE actual source of the reported hallucination. account_status fans out
/users/me (identity) + /organizations/{id}/quota_status (quota). For an
org with no billing plan (plan: null), the backend's quota_status returns
401 while /users/me succeeds with the SAME token. The composite captured
that as quota_error {code: AUTH_EXPIRED, http_status: 401} in an
otherwise-successful result — and both the tool description's
rendering_hint AND the quota_error output-schema description told the
agent 'on 401/403 tell the user to reconnect / re-run OAuth'. So a
perfectly-authenticated user on a plan-less org was told to reconnect
every time.
Inverts the guidance in both places: a quota 401/403 is explicitly NOT a
broken login (the user/org fields in the same response used the same
token and succeeded — commonly just an org with no plan); the agent must
NOT tell the user to reconnect, only report quota unreadable. Core 0.8.2.
Co-Authored-By: Claude <noreply@anthropic.com>
…tion (product#3761) Arty asked to keep the installer's region handling as it was. Removes the GeoIP-verification (verifyTokenRegion) and the US/FR picker added earlier in this branch, restoring oauth.ts + installer-gui.ts to their main state. The region was never the cause of the reported 401 anyway — the account verified to region=us fine; the hallucination came from the quota sub-call 401 (kept fix, renumbered to v0.21.4). Co-Authored-By: Claude <noreply@anthropic.com>
…on work Co-Authored-By: Claude <noreply@anthropic.com>
…reporting it (v0.21.5, product#3761) Per Arty: a quota/plan read failure (plan-less-org quota_status 401) is an internal diagnostic, irrelevant to the user. 0.21.4 stopped the 'reconnect' instruction but still had the agent say 'quota unreadable'. Now the guidance tells the agent to omit quota entirely — do NOT mention quota, the error, or the 401 — and just answer user/org/lens. Updated the account-status.md.tmpl rendering_hint and the quota_error output-schema description. Co-Authored-By: Claude <noreply@anthropic.com>
… not the id (product#3761) Per Arty: the agent reported the active lens unprompted and showed the raw numeric id (40005). The composite now resolves last_requested_lens -> a new last_requested_lens_name field (best-effort via /lenses), and the guidance tells the agent (1) not to volunteer the lens at all, and (2) if the user explicitly asks, answer with the NAME, never the number. Core 0.8.3. Co-Authored-By: Claude <noreply@anthropic.com>
…roduct#3761 review)
Lens ids are STRINGS server-side (e.g. "40005") — documented in
my-lenses.ts — but me.last_requested_lens may be a number, so the strict
`l.id === lensId` silently missed ("40005" === 40005 is false) and left
last_requested_lens_name null, defeating the new 'answer with the name'
guidance for those users. Normalize both sides to string before matching,
matching the sid() pattern in my-lenses.ts.
Co-Authored-By: Claude <noreply@anthropic.com>
…tching the schema + backend shape (product#3761 review)
The schema declared last_requested_lens as number but the backend sends
lens ids as STRINGS ("40005"; documented in my-lenses.ts) and the
composite returned me.last_requested_lens verbatim — so structured
clients reading outputSchema saw string-vs-number drift on the same
account-status response this PR fixes.
Normalize the returned id to string and declare the schema ["string",
"null"]; widen UserMePayload.last_requested_lens to string|number|null to
reflect the real backend shape, and coerce at the one numeric use site
(client.defaultLensId) with Number().
Co-Authored-By: Claude <noreply@anthropic.com>
… lens-name evals (product#3761)
Adds eval coverage (WORKFLOWS.md WF30 + WF31) for the two product#3761
behaviors and hardens the quota fix from guidance-only to code-level:
- WF30 (silent quota): on a plan-less org whose quota_status 401s, the
agent must not mention quota/401/reconnect. Live eval showed prompt
guidance alone was leaky (the agent still hedged 'quota had a hiccup'),
so account_status now WITHHOLDS a 401/403 quota_error from the payload
entirely — the agent literally cannot see it. A genuine non-auth
failure (500) still surfaces as quota_error. Verified 3/3 clean live
runs against the SnapLock account (which really 401s).
- WF31 (lens by name): when asked which lens is active, answer with the
name, never the raw id. Verified 3/3 live runs ('Autom Lens', never
40005). The 'do not volunteer lens unprompted' rule is kept as guidance
but NOT gated in the eval — it's instruction-level, not deterministic,
and a hard pass bar would be a fabricated 5/5.
- New test account-status-quota-401.test.ts locks: 401/403 withheld, 500
surfaces, lens id->name with string-id normalization.
Co-Authored-By: Claude <noreply@anthropic.com>
…tatus never volunteers it (product#3761) The 'don't volunteer the lens unprompted' rule was guidance-only and leaked the lens in ~1/3 of live plain-account runs. Make it deterministic: the composite now reads the verbatim user trigger (newly plumbed into ToolContext.triggered_by from the server) and only resolves + includes the lens (id AND name) when the trigger mentions lens/audience/targeting/ segment/filter. When not asked, both lens fields are withheld from the payload entirely — the agent literally cannot volunteer what it can't see. Safe failure: an unusual phrasing that misses the keywords omits the lens (never leaks). When asked, the human NAME is resolved (string-id normalized), never the raw id. Verified live: 3/3 plain-account runs now show NO lens (vs 2/6 leaks before). WF30 re-asserts the no-lens criterion (now deterministic). New tests cover asked/not-asked/no-trigger gating. Co-Authored-By: Claude <noreply@anthropic.com>
main shipped/tagged 0.21.2 (dxt OAuth, #108) while this branch reused 0.21.2-0.21.5 for the 401 work. Resolved the 3 conflicts: - package.json + server.json -> 0.21.3 (next clean number after main's 0.21.2) - CHANGELOG: collapsed the four 401-fix dev entries into one 0.21.3 release (4 bullets, the final code-gated behavior), kept main's 0.21.2 dxt entry intact. Auto-merges (server.ts triggered_by wiring, WORKFLOWS.md WF30/31) verified: prompts:build no-drift, build, typecheck, and full test pass (core 414, promptforge 16, mcp 455). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
[Claude]: Rebased onto main and resolved the conflicts. main had already shipped/tagged 0.21.2 (the dxt OAuth fix, #108), which collided with this branch's 0.21.2-0.21.5 headings. Resolution — ships as 0.21.3 (next clean number after main's 0.21.2):
Green locally after the merge: Still open (not done by this merge): no eval/dashboard archive for WF30/31 - the judge ran via subagents, not the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
On startup / in
leadbay_account_status, the assistant told the user there was a 401 and to reconnect / re-authenticate Leadbay — even though the tools worked fine (issue #3761). This PR fixes every layer that produced or surfaced that 401.The fixes
1.
leadbay_account_status— the actual source the user hitaccount_statusfans out/users/me(identity, succeeds) +/organizations/{id}/quota_status(quota). For an org with no billing plan (plan: null), the backend'squota_statusreturns 401 while/users/mesucceeds with the same token. The composite surfaced that asquota_error: {code: AUTH_EXPIRED, http_status: 401}and both the tool description and thequota_errorschema told the agent "on 401/403 tell the user to reconnect" — so a perfectly-authenticated user was told to reconnect.user/orgin the same response used the same token and succeeded).core/composite/account-status.ts(quota_erroroutput-schema desc) +account-status.md.tmplrendering_hint.2.
account_status— lens hygieneThe agent volunteered the active lens unprompted and showed the raw numeric id (
40005).last_requested_lens→ a newlast_requested_lens_namefield (best-effort via/lenses). Guidance: do not volunteer the lens; only if explicitly asked, answer with the name, never the id.3. Hosted HTTP MCP — empty the 401 OAuth-challenge body
The Fly connector's
401+WWW-AuthenticateOAuth challenge (correct, unchanged) also carried a JSON body ("Sign in with Leadbay again.") that Claude's host surfaced to the LLM as a reconnect instruction even though the retry succeeded.http-server.tssendChallenge()now returns an empty 401 body; status + header preserved byte-for-byte.test/unit/http-auth-challenge-body.test.ts.4. Local stdio MCP — don't narrate a transient 401 as re-auth
Leadbay tokens don't expire on a timer; the client auto-retries a GET 401 once, and only a persistent 401 surfaces (as
AUTH_EXPIRED). New always-on server-instructionTRANSIENT_401tells the agent a single 401 is a brief Leadbay-side hiccup the client already retried — retry silently, never tell the user to reconnect on a one-off.snippets/server-instructions/transient-401.md+ wired inserver.ts. Test:test/unit/server-instructions-transient-401.test.ts.Reverted in this branch
An earlier installer region picker + GeoIP token-region verification was added, then reverted at the maintainer's request — the region was never the cause (the account verified to
region: usfine).oauth.ts+installer-gui.tsare back to theirmainstate.Versions
mcp
0.21.1→0.21.5; core0.8.1→0.8.3. CHANGELOG updated.Verification
pnpm -r build && pnpm -r typecheck && pnpm -r testall green — core 407, promptforge 16, mcp 425.401,WWW-Authenticateintact, body 0 bytes.--localinstaller against the real account; confirmed account_status no longer mentions quota/401/reconnect and no longer volunteers the lens.Honest scope
quota_errorfield still exists in the raw tool response; the model is instructed not to surface it. A belt-and-suspenders option (stripquota_errorfrom the payload entirely) was not done.account_statuscomposite has no direct unit test (the lens-name resolution + silent-quota behavior are not yet test-covered — would land in a newaccount-status.test.ts). Per maintainer direction the eval/test work was deferred.quota_statusfor plan-less orgs) is unchanged — a separate backend concern.Closes https://github.com/leadbay/product/issues/3761
Live eval proof (product#3761 — workflows 30 + 31)
Two regression-lock workflows added to
WORKFLOWS.md, run live against the SnapLock account — a plan-less org whosequota_statusgenuinely 401s (the exact bug condition; confirmedcurl→ 401). Sessions run with server-instructions as the system prompt, scored by a blind evidence-only judge.WF31 — "...which lens is active?" (name, not id) — 5/5/5/5 × 3 consecutive:
40005never surfacedWF30 — "What account am I connected to?" (silent quota + no unprompted lens) — all 3 PASS, two at 5/5/5/5:
Two fixes made deterministic in code (not just guidance):
account_statusstrips a 401/403quota_errorfrom the payload entirely — the agent cannot see or mention it (a 500 still surfaces). Guidance alone was leaky (an agent hedged "quota had a hiccup"). Locked byaccount-status-quota-401.test.ts._triggered_byis now plumbed intoToolContext). Before: the lens leaked unprompted in 2/6 runs. After: 3/3 plain-account runs show no lens. When asked (WF31), the name resolves (string-id normalized), never the raw id.Honest gaps:
/evalskill's nestedclaude -pwas permission-blocked here), so not archived to the mcp-dashboard. Scores are real (blind judge over captured live sessions); for dashboard history re-run/eval --workflow 30,31unattended.--system-prompt).