Skip to content

mcp: recover from invalid Relaycast agent tokens mid-session#1001

Merged
willwashburn merged 4 commits into
mainfrom
claude/nice-turing-g05t2
May 27, 2026
Merged

mcp: recover from invalid Relaycast agent tokens mid-session#1001
willwashburn merged 4 commits into
mainfrom
claude/nice-turing-g05t2

Conversation

@willwashburn

Copy link
Copy Markdown
Member

Summary

Ports the recovery contract from relaycast PR #137 into the agent-relay MCP server, SDK, and Rust broker so a revoked or expired Relaycast agent token no longer wedges a session until the process is restarted.

  • SDK (packages/sdk/src/relaycast-errors.ts): shared INVALID_AGENT_TOKEN_CODE plus isInvalidAgentTokenError / isInvalidAgentTokenToolResult / agentTokenRecoveryMessage. Recognises both the typed agent_token_invalid code (PR fix(dashboard): enable mobile touch scrolling in log viewers #137) and the legacy 401 + Invalid agent token pair, plus body.error envelopes and nested cause chains.
  • MCP server (packages/cli/src/cli/relaycast-mcp.ts): invalidateAgentToken(asIdentity?) drops the stale identity from session.agents and clears session.agentToken/agentName only when the invalidated identity is the active one — mirrors the PR's routed-vs-active scoping. enableInboxPiggyback now catches thrown invalid-token errors and inspects successful tool-result bodies for the legacy marker, surfacing a structured isError: true recovery payload that points at register_agent. The opportunistic inbox fetch invalidates too. register_agent is exempt so it stays a clean recovery path.
  • MCP server: registerAgentWithRebind short-circuit now requires the strict-named identity to still be in session.agents, and prefers the per-identity token when present — so post-invalidation registrations rotate instead of handing back the dead token.
  • Broker (crates/broker/src/relaycast/auth.rs): AuthHttpError now carries the upstream code, relay_error_to_anyhow propagates it, and new is_agent_token_invalid / is_agent_token_invalid_anyhow / is_agent_token_invalid_code helpers + AGENT_TOKEN_INVALID_CODE constant let downstream Rust callers react to the same signal. The agent-create path also normalizes legacy 401+message responses to the typed code.

Test plan

  • npx vitest run packages/cli/src/cli/relaycast-mcp.test.ts packages/cli/src/cli/relaycast-mcp.startup.test.ts — 22 tests pass (3 new rebind tests, 19 pre-existing).
  • cd packages/sdk && npx vitest run src/__tests__/relaycast-errors.test.ts — 12 new detector tests pass.
  • cargo test -p agent-relay-broker --lib relaycast::auth::tests — 17 tests pass (3 new, 14 pre-existing).
  • cargo check -p agent-relay-broker — clean.
  • cd packages/cli && npx tsc --noEmit — clean.
  • Manual verification: run agent-relay mcp against a workspace, rotate an agent token out-of-band, and confirm the next tool call returns the agent_token_invalid recovery message and that calling register_agent rebinds without restart.

https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV


Generated by Claude Code

When a previously-issued Relaycast agent token is revoked or expires while
the MCP server is running, the next upstream call returns 401 with code
agent_token_invalid (per relaycast PR #137) or the legacy "Invalid agent
token" message. Until now the session held the dead token forever and
every tool kept failing.

The MCP server (packages/cli/src/cli/relaycast-mcp.ts) now detects the
condition on both thrown errors and tool-result bodies via shared
detectors in @agent-relay/sdk, drops the stale identity from the per-name
agents map (and clears the active token when the active identity is the
invalidated one), surfaces a structured recovery response pointing at
register_agent, and lets strict-named sessions re-register without a
process restart.

The broker mirrors the contract: AuthHttpError carries the upstream
RelayError code through relay_error_to_anyhow, the agent-create path
normalizes legacy 401+message responses to the typed agent_token_invalid
code, and new helpers (is_agent_token_invalid,
is_agent_token_invalid_anyhow, is_agent_token_invalid_code) let
downstream Rust callers react to the same signal.

https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV
@willwashburn willwashburn requested a review from khaliqgant as a code owner May 27, 2026 09:21
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9a2fc378-80ba-4f1c-b46a-3791700f3486

📥 Commits

Reviewing files that changed from the base of the PR and between 487f461 and e493804.

📒 Files selected for processing (4)
  • crates/broker/src/relaycast/auth.rs
  • packages/cli/src/cli/relaycast-mcp.ts
  • packages/sdk/src/__tests__/relaycast-errors.test.ts
  • packages/sdk/src/relaycast-errors.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/sdk/src/relaycast-errors.ts
  • crates/broker/src/relaycast/auth.rs
  • packages/cli/src/cli/relaycast-mcp.ts

📝 Walkthrough

Walkthrough

End-to-end invalid-agent-token detection and mid-session recovery: SDK provides detection and recovery messaging; broker preserves and normalizes Relay API error codes and exposes Rust helpers; CLI caches per-identity tokens, invalidates stale tokens, and triggers re-registration via inbox piggyback.

Changes

Agent Token Invalidation Recovery

Layer / File(s) Summary
SDK error detection and recovery utilities
packages/sdk/src/relaycast-errors.ts, packages/sdk/src/index.ts, packages/sdk/src/__tests__/relaycast-errors.test.ts
Introduces INVALID_AGENT_TOKEN_CODE, INVALID_AGENT_TOKEN_MESSAGE, isInvalidAgentTokenError, isInvalidAgentTokenToolResult, and agentTokenRecoveryMessage; normalizes detection across typed codes, legacy 401+message pairs, nested envelopes, and cause chains. Tests validate detection and recovery message content.
Broker invalid-token detection and error normalization
crates/broker/src/relaycast/auth.rs, crates/broker/src/relaycast/mod.rs
Adds AuthHttpError.code, AGENT_TOKEN_INVALID_CODE, is_agent_token_invalid_code, is_agent_token_invalid, is_agent_token_invalid_anyhow; preserves Relay API code in relay_error_to_anyhow; normalizes invalid-token failures during registration and includes unit tests.
CLI token invalidation, per-identity caching, and rebind logic
packages/cli/src/cli/relaycast-mcp.ts, packages/cli/src/cli/relaycast-mcp.test.ts
Adds optional per-identity agents map to RegistrationSession; registerAgentWithRebind prefers cached identity tokens when valid; introduces invalidateAgentToken(asIdentity?) to clear stale tokens and updates enableInboxPiggyback to detect invalid-token tool results/exceptions and append recovery messaging. Tests cover rebind and token selection scenarios.
Changelog documentation
CHANGELOG.md
Documents the new agent token recovery behavior for agent-relay MCP, @agent-relay/sdk, and agent-relay-broker under Unreleased/Added.

Sequence Diagram

sequenceDiagram
  participant ToolHandler as Tool Handler
  participant Inbox as enableInboxPiggyback
  participant InvalidCheck as isInvalidAgentTokenError
  participant TokenCache as session.agents
  participant RecoveryMsg as agentTokenRecoveryMessage
  ToolHandler->>Inbox: invoke tool with as identity
  Inbox->>InvalidCheck: detect invalid-token in result/exception
  InvalidCheck-->>Inbox: invalid token found
  Inbox->>TokenCache: invalidateAgentToken(asIdentity)
  TokenCache->>TokenCache: remove identity from cache, clear session
  Inbox->>RecoveryMsg: append recovery guidance
  RecoveryMsg-->>Inbox: register_agent instruction
  Inbox->>ToolHandler: return result with recovery message
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Suggested reviewers

  • khaliqgant

Poem

🐰 Hop along the token trail so bright,
SDK listens, broker sets things right,
CLI clears the stale and makes a new start,
Re-register, retry — a fresh beating heart,
Hops of joy as tokens come apart.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: token recovery for agent sessions mid-operation, directly reflected in the core logic changes across SDK, MCP server, and broker.
Description check ✅ Passed The description comprehensively covers all major changes (SDK, MCP server, broker), includes implementation details, and provides evidence of testing. However, manual verification is marked incomplete, and the template's 'Test Plan' section uses checkboxes.
Docstring Coverage ✅ Passed Docstring coverage is 95.65% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/nice-turing-g05t2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust recovery mechanisms for handling stale or invalid Relaycast agent tokens mid-session across the MCP server, TypeScript SDK, and Rust broker. It implements structural detection helpers to recognize both the new typed agent_token_invalid error code and legacy 401 status/message pairs, clearing dead tokens and guiding clients to re-register. Feedback on the implementation suggests using a WeakSet in the TypeScript SDK's error detector to guard against potential infinite recursion from cyclic error cause chains, and trimming the error code string in the Rust broker's comparison helper to align with the TypeScript SDK's behavior.

Comment thread packages/sdk/src/relaycast-errors.ts Outdated
Comment on lines +65 to +84
export function isInvalidAgentTokenError(error: unknown): boolean {
if (!error || typeof error !== 'object') return false;
const err = error as MaybeError;

if (normalizeCode(err.code) === INVALID_AGENT_TOKEN_CODE) return true;

const bodyError = readBodyError(err.body);
if (bodyError && normalizeCode(bodyError.code) === INVALID_AGENT_TOKEN_CODE) return true;

const status = readStatus(err.statusCode) ?? readStatus(err.status);
const message =
(typeof err.message === 'string' ? err.message.trim() : '') ||
(bodyError?.message?.trim() ?? '');
if (status === 401 && message === INVALID_AGENT_TOKEN_MESSAGE) return true;

if (err.cause && err.cause !== error) {
return isInvalidAgentTokenError(err.cause);
}
return false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent potential infinite recursion and stack overflow crashes in environments with cyclic error cause chains (e.g., A.cause === B and B.cause === A), we can track visited errors using a WeakSet.

export function isInvalidAgentTokenError(error: unknown, visited = new WeakSet<object>()): boolean {
  if (!error || typeof error !== 'object') return false;
  if (visited.has(error)) return false;
  visited.add(error);
  const err = error as MaybeError;

  if (normalizeCode(err.code) === INVALID_AGENT_TOKEN_CODE) return true;

  const bodyError = readBodyError(err.body);
  if (bodyError && normalizeCode(bodyError.code) === INVALID_AGENT_TOKEN_CODE) return true;

  const status = readStatus(err.statusCode) ?? readStatus(err.status);
  const message =
    (typeof err.message === 'string' ? err.message.trim() : '') ||
    (bodyError?.message?.trim() ?? '');
  if (status === 401 && message === INVALID_AGENT_TOKEN_MESSAGE) return true;

  if (err.cause) {
    return isInvalidAgentTokenError(err.cause, visited);
  }
  return false;
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e493804isInvalidAgentTokenError now threads a WeakSet through the recursive cause traversal so cyclic graphs terminate cleanly. Added two regression tests: one for a 2-node cycle with no marker (returns false), one for a cycle where one node carries the marker (still detects).


Generated by Claude Code

Comment on lines +210 to +212
pub fn is_agent_token_invalid_code(code: &str) -> bool {
code.eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the TypeScript SDK implementation (which trims whitespace from the error code), we should trim the code string in Rust as well before comparing it case-insensitively.

Suggested change
pub fn is_agent_token_invalid_code(code: &str) -> bool {
code.eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE)
}
pub fn is_agent_token_invalid_code(code: &str) -> bool {
code.trim().eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE)
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e493804is_agent_token_invalid_code now trims before eq_ignore_ascii_case, matching the TypeScript normalizeCode contract. The relay_error_to_anyhow conversion also trims when persisting code into AuthHttpError. Added two unit tests covering the trimmed-code and anyhow-flavored paths.


Generated by Claude Code

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/cli/src/cli/relaycast-mcp.ts (1)

15-19: ⚡ Quick win

Use the shared invalid-token code constant instead of a string literal.

Line 634 hardcodes 'agent_token_invalid', which can drift from the SDK contract you already import from the same package.

Proposed fix
 import {
+  INVALID_AGENT_TOKEN_CODE,
   agentTokenRecoveryMessage,
   isInvalidAgentTokenError,
   isInvalidAgentTokenToolResult,
 } from '`@agent-relay/sdk`';
@@
     structuredContent: {
-      error: { code: 'agent_token_invalid', message: text },
+      error: { code: INVALID_AGENT_TOKEN_CODE, message: text },
     },

Also applies to: 629-635

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/cli/relaycast-mcp.ts` around lines 15 - 19, Replace the
hardcoded string 'agent_token_invalid' with the shared invalid-token code
constant exported by `@agent-relay/sdk`: add the SDK's exported invalid-token
constant to the existing imports (alongside agentTokenRecoveryMessage,
isInvalidAgentTokenError, isInvalidAgentTokenToolResult) and use that constant
wherever 'agent_token_invalid' is currently compared/used (around the logic that
references isInvalidAgentTokenError / isInvalidAgentTokenToolResult and
agentTokenRecoveryMessage) so the code relies on the canonical SDK value instead
of a string literal.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/broker/src/relaycast/auth.rs`:
- Around line 210-212: Normalize token-invalid codes by trimming surrounding
whitespace before comparison and when storing: update
is_agent_token_invalid_code to call .trim() on the incoming code before
performing eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE), and apply the same
normalization to the other token-invalid check(s) referenced around the 828–832
area so inputs like " agent_token_invalid " are detected; ensure any places that
persist or compare these codes also use .trim() (and keep eq_ignore_ascii_case
for case-insensitivity) for consistent behavior.

In `@packages/sdk/src/relaycast-errors.ts`:
- Around line 80-82: The recursion in isInvalidAgentTokenError traverses
err.cause without cycle detection and can infinite-loop on cyclic error graphs;
modify isInvalidAgentTokenError to track visited error objects (e.g., add an
optional visited Set parameter or use an iterative loop) and check the Set
before recursing into err.cause, adding each error to the Set when visited so
cycles (a.cause = b; b.cause = a) are detected and recursion stops safely.

---

Nitpick comments:
In `@packages/cli/src/cli/relaycast-mcp.ts`:
- Around line 15-19: Replace the hardcoded string 'agent_token_invalid' with the
shared invalid-token code constant exported by `@agent-relay/sdk`: add the SDK's
exported invalid-token constant to the existing imports (alongside
agentTokenRecoveryMessage, isInvalidAgentTokenError,
isInvalidAgentTokenToolResult) and use that constant wherever
'agent_token_invalid' is currently compared/used (around the logic that
references isInvalidAgentTokenError / isInvalidAgentTokenToolResult and
agentTokenRecoveryMessage) so the code relies on the canonical SDK value instead
of a string literal.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5617dbdb-31f6-4855-80d8-e56e7142f350

📥 Commits

Reviewing files that changed from the base of the PR and between 6a456b7 and b8b32bc.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • crates/broker/src/relaycast/auth.rs
  • crates/broker/src/relaycast/mod.rs
  • packages/cli/src/cli/relaycast-mcp.test.ts
  • packages/cli/src/cli/relaycast-mcp.ts
  • packages/sdk/src/__tests__/relaycast-errors.test.ts
  • packages/sdk/src/index.ts
  • packages/sdk/src/relaycast-errors.ts

Comment thread crates/broker/src/relaycast/auth.rs
Comment thread packages/sdk/src/relaycast-errors.ts Outdated
- isInvalidAgentTokenError: thread a WeakSet through the recursive
  `cause` traversal so cyclic graphs (a.cause = b; b.cause = a)
  terminate cleanly instead of blowing the stack. Two new tests cover
  both the no-match and match-inside-cycle paths.
- is_agent_token_invalid_code (Rust): trim whitespace before the
  case-insensitive comparison so " agent_token_invalid " is detected,
  matching the TypeScript normalizeCode contract. Persist trimmed
  codes through relay_error_to_anyhow. Two new unit tests cover the
  trimmed and anyhow-flavored paths.
- relaycast-mcp.ts: replace the hardcoded 'agent_token_invalid' literal
  with the imported INVALID_AGENT_TOKEN_CODE constant so the canonical
  SDK value is the single source of truth.

https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV
@willwashburn willwashburn merged commit 9fcc4f6 into main May 27, 2026
49 checks passed
@willwashburn willwashburn deleted the claude/nice-turing-g05t2 branch May 27, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants