mcp: recover from invalid Relaycast agent tokens mid-session#1001
Conversation
When a previously-issued Relaycast agent token is revoked or expires while the MCP server is running, the next upstream call returns 401 with code agent_token_invalid (per relaycast PR #137) or the legacy "Invalid agent token" message. Until now the session held the dead token forever and every tool kept failing. The MCP server (packages/cli/src/cli/relaycast-mcp.ts) now detects the condition on both thrown errors and tool-result bodies via shared detectors in @agent-relay/sdk, drops the stale identity from the per-name agents map (and clears the active token when the active identity is the invalidated one), surfaces a structured recovery response pointing at register_agent, and lets strict-named sessions re-register without a process restart. The broker mirrors the contract: AuthHttpError carries the upstream RelayError code through relay_error_to_anyhow, the agent-create path normalizes legacy 401+message responses to the typed agent_token_invalid code, and new helpers (is_agent_token_invalid, is_agent_token_invalid_anyhow, is_agent_token_invalid_code) let downstream Rust callers react to the same signal. https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughEnd-to-end invalid-agent-token detection and mid-session recovery: SDK provides detection and recovery messaging; broker preserves and normalizes Relay API error codes and exposes Rust helpers; CLI caches per-identity tokens, invalidates stale tokens, and triggers re-registration via inbox piggyback. ChangesAgent Token Invalidation Recovery
Sequence DiagramsequenceDiagram
participant ToolHandler as Tool Handler
participant Inbox as enableInboxPiggyback
participant InvalidCheck as isInvalidAgentTokenError
participant TokenCache as session.agents
participant RecoveryMsg as agentTokenRecoveryMessage
ToolHandler->>Inbox: invoke tool with as identity
Inbox->>InvalidCheck: detect invalid-token in result/exception
InvalidCheck-->>Inbox: invalid token found
Inbox->>TokenCache: invalidateAgentToken(asIdentity)
TokenCache->>TokenCache: remove identity from cache, clear session
Inbox->>RecoveryMsg: append recovery guidance
RecoveryMsg-->>Inbox: register_agent instruction
Inbox->>ToolHandler: return result with recovery message
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces robust recovery mechanisms for handling stale or invalid Relaycast agent tokens mid-session across the MCP server, TypeScript SDK, and Rust broker. It implements structural detection helpers to recognize both the new typed agent_token_invalid error code and legacy 401 status/message pairs, clearing dead tokens and guiding clients to re-register. Feedback on the implementation suggests using a WeakSet in the TypeScript SDK's error detector to guard against potential infinite recursion from cyclic error cause chains, and trimming the error code string in the Rust broker's comparison helper to align with the TypeScript SDK's behavior.
| export function isInvalidAgentTokenError(error: unknown): boolean { | ||
| if (!error || typeof error !== 'object') return false; | ||
| const err = error as MaybeError; | ||
|
|
||
| if (normalizeCode(err.code) === INVALID_AGENT_TOKEN_CODE) return true; | ||
|
|
||
| const bodyError = readBodyError(err.body); | ||
| if (bodyError && normalizeCode(bodyError.code) === INVALID_AGENT_TOKEN_CODE) return true; | ||
|
|
||
| const status = readStatus(err.statusCode) ?? readStatus(err.status); | ||
| const message = | ||
| (typeof err.message === 'string' ? err.message.trim() : '') || | ||
| (bodyError?.message?.trim() ?? ''); | ||
| if (status === 401 && message === INVALID_AGENT_TOKEN_MESSAGE) return true; | ||
|
|
||
| if (err.cause && err.cause !== error) { | ||
| return isInvalidAgentTokenError(err.cause); | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
To prevent potential infinite recursion and stack overflow crashes in environments with cyclic error cause chains (e.g., A.cause === B and B.cause === A), we can track visited errors using a WeakSet.
export function isInvalidAgentTokenError(error: unknown, visited = new WeakSet<object>()): boolean {
if (!error || typeof error !== 'object') return false;
if (visited.has(error)) return false;
visited.add(error);
const err = error as MaybeError;
if (normalizeCode(err.code) === INVALID_AGENT_TOKEN_CODE) return true;
const bodyError = readBodyError(err.body);
if (bodyError && normalizeCode(bodyError.code) === INVALID_AGENT_TOKEN_CODE) return true;
const status = readStatus(err.statusCode) ?? readStatus(err.status);
const message =
(typeof err.message === 'string' ? err.message.trim() : '') ||
(bodyError?.message?.trim() ?? '');
if (status === 401 && message === INVALID_AGENT_TOKEN_MESSAGE) return true;
if (err.cause) {
return isInvalidAgentTokenError(err.cause, visited);
}
return false;
}There was a problem hiding this comment.
Fixed in e493804 — isInvalidAgentTokenError now threads a WeakSet through the recursive cause traversal so cyclic graphs terminate cleanly. Added two regression tests: one for a 2-node cycle with no marker (returns false), one for a cycle where one node carries the marker (still detects).
Generated by Claude Code
| pub fn is_agent_token_invalid_code(code: &str) -> bool { | ||
| code.eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE) | ||
| } |
There was a problem hiding this comment.
For consistency with the TypeScript SDK implementation (which trims whitespace from the error code), we should trim the code string in Rust as well before comparing it case-insensitively.
| pub fn is_agent_token_invalid_code(code: &str) -> bool { | |
| code.eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE) | |
| } | |
| pub fn is_agent_token_invalid_code(code: &str) -> bool { | |
| code.trim().eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE) | |
| } |
There was a problem hiding this comment.
Fixed in e493804 — is_agent_token_invalid_code now trims before eq_ignore_ascii_case, matching the TypeScript normalizeCode contract. The relay_error_to_anyhow conversion also trims when persisting code into AuthHttpError. Added two unit tests covering the trimmed-code and anyhow-flavored paths.
Generated by Claude Code
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/cli/src/cli/relaycast-mcp.ts (1)
15-19: ⚡ Quick winUse the shared invalid-token code constant instead of a string literal.
Line 634 hardcodes
'agent_token_invalid', which can drift from the SDK contract you already import from the same package.Proposed fix
import { + INVALID_AGENT_TOKEN_CODE, agentTokenRecoveryMessage, isInvalidAgentTokenError, isInvalidAgentTokenToolResult, } from '`@agent-relay/sdk`'; @@ structuredContent: { - error: { code: 'agent_token_invalid', message: text }, + error: { code: INVALID_AGENT_TOKEN_CODE, message: text }, },Also applies to: 629-635
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/cli/src/cli/relaycast-mcp.ts` around lines 15 - 19, Replace the hardcoded string 'agent_token_invalid' with the shared invalid-token code constant exported by `@agent-relay/sdk`: add the SDK's exported invalid-token constant to the existing imports (alongside agentTokenRecoveryMessage, isInvalidAgentTokenError, isInvalidAgentTokenToolResult) and use that constant wherever 'agent_token_invalid' is currently compared/used (around the logic that references isInvalidAgentTokenError / isInvalidAgentTokenToolResult and agentTokenRecoveryMessage) so the code relies on the canonical SDK value instead of a string literal.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/broker/src/relaycast/auth.rs`:
- Around line 210-212: Normalize token-invalid codes by trimming surrounding
whitespace before comparison and when storing: update
is_agent_token_invalid_code to call .trim() on the incoming code before
performing eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE), and apply the same
normalization to the other token-invalid check(s) referenced around the 828–832
area so inputs like " agent_token_invalid " are detected; ensure any places that
persist or compare these codes also use .trim() (and keep eq_ignore_ascii_case
for case-insensitivity) for consistent behavior.
In `@packages/sdk/src/relaycast-errors.ts`:
- Around line 80-82: The recursion in isInvalidAgentTokenError traverses
err.cause without cycle detection and can infinite-loop on cyclic error graphs;
modify isInvalidAgentTokenError to track visited error objects (e.g., add an
optional visited Set parameter or use an iterative loop) and check the Set
before recursing into err.cause, adding each error to the Set when visited so
cycles (a.cause = b; b.cause = a) are detected and recursion stops safely.
---
Nitpick comments:
In `@packages/cli/src/cli/relaycast-mcp.ts`:
- Around line 15-19: Replace the hardcoded string 'agent_token_invalid' with the
shared invalid-token code constant exported by `@agent-relay/sdk`: add the SDK's
exported invalid-token constant to the existing imports (alongside
agentTokenRecoveryMessage, isInvalidAgentTokenError,
isInvalidAgentTokenToolResult) and use that constant wherever
'agent_token_invalid' is currently compared/used (around the logic that
references isInvalidAgentTokenError / isInvalidAgentTokenToolResult and
agentTokenRecoveryMessage) so the code relies on the canonical SDK value instead
of a string literal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 5617dbdb-31f6-4855-80d8-e56e7142f350
📒 Files selected for processing (8)
CHANGELOG.mdcrates/broker/src/relaycast/auth.rscrates/broker/src/relaycast/mod.rspackages/cli/src/cli/relaycast-mcp.test.tspackages/cli/src/cli/relaycast-mcp.tspackages/sdk/src/__tests__/relaycast-errors.test.tspackages/sdk/src/index.tspackages/sdk/src/relaycast-errors.ts
- isInvalidAgentTokenError: thread a WeakSet through the recursive `cause` traversal so cyclic graphs (a.cause = b; b.cause = a) terminate cleanly instead of blowing the stack. Two new tests cover both the no-match and match-inside-cycle paths. - is_agent_token_invalid_code (Rust): trim whitespace before the case-insensitive comparison so " agent_token_invalid " is detected, matching the TypeScript normalizeCode contract. Persist trimmed codes through relay_error_to_anyhow. Two new unit tests cover the trimmed and anyhow-flavored paths. - relaycast-mcp.ts: replace the hardcoded 'agent_token_invalid' literal with the imported INVALID_AGENT_TOKEN_CODE constant so the canonical SDK value is the single source of truth. https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV
Summary
Ports the recovery contract from relaycast PR #137 into the agent-relay MCP server, SDK, and Rust broker so a revoked or expired Relaycast agent token no longer wedges a session until the process is restarted.
packages/sdk/src/relaycast-errors.ts): sharedINVALID_AGENT_TOKEN_CODEplusisInvalidAgentTokenError/isInvalidAgentTokenToolResult/agentTokenRecoveryMessage. Recognises both the typedagent_token_invalidcode (PR fix(dashboard): enable mobile touch scrolling in log viewers #137) and the legacy 401 +Invalid agent tokenpair, plusbody.errorenvelopes and nestedcausechains.packages/cli/src/cli/relaycast-mcp.ts):invalidateAgentToken(asIdentity?)drops the stale identity fromsession.agentsand clearssession.agentToken/agentNameonly when the invalidated identity is the active one — mirrors the PR's routed-vs-active scoping.enableInboxPiggybacknow catches thrown invalid-token errors and inspects successful tool-result bodies for the legacy marker, surfacing a structuredisError: truerecovery payload that points atregister_agent. The opportunistic inbox fetch invalidates too.register_agentis exempt so it stays a clean recovery path.registerAgentWithRebindshort-circuit now requires the strict-named identity to still be insession.agents, and prefers the per-identity token when present — so post-invalidation registrations rotate instead of handing back the dead token.crates/broker/src/relaycast/auth.rs):AuthHttpErrornow carries the upstreamcode,relay_error_to_anyhowpropagates it, and newis_agent_token_invalid/is_agent_token_invalid_anyhow/is_agent_token_invalid_codehelpers +AGENT_TOKEN_INVALID_CODEconstant let downstream Rust callers react to the same signal. The agent-create path also normalizes legacy 401+message responses to the typed code.Test plan
npx vitest run packages/cli/src/cli/relaycast-mcp.test.ts packages/cli/src/cli/relaycast-mcp.startup.test.ts— 22 tests pass (3 new rebind tests, 19 pre-existing).cd packages/sdk && npx vitest run src/__tests__/relaycast-errors.test.ts— 12 new detector tests pass.cargo test -p agent-relay-broker --lib relaycast::auth::tests— 17 tests pass (3 new, 14 pre-existing).cargo check -p agent-relay-broker— clean.cd packages/cli && npx tsc --noEmit— clean.agent-relay mcpagainst a workspace, rotate an agent token out-of-band, and confirm the next tool call returns theagent_token_invalidrecovery message and that callingregister_agentrebinds without restart.https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV
Generated by Claude Code