mcp: recover from invalid Relaycast agent tokens mid-session by willwashburn · Pull Request #1001 · AgentWorkforce/relay

willwashburn · 2026-05-27T09:21:49Z

Summary

Ports the recovery contract from relaycast PR #137 into the agent-relay MCP server, SDK, and Rust broker so a revoked or expired Relaycast agent token no longer wedges a session until the process is restarted.

SDK (packages/sdk/src/relaycast-errors.ts): shared INVALID_AGENT_TOKEN_CODE plus isInvalidAgentTokenError / isInvalidAgentTokenToolResult / agentTokenRecoveryMessage. Recognises both the typed agent_token_invalid code (PR fix(dashboard): enable mobile touch scrolling in log viewers #137) and the legacy 401 + Invalid agent token pair, plus body.error envelopes and nested cause chains.
MCP server (packages/cli/src/cli/relaycast-mcp.ts): invalidateAgentToken(asIdentity?) drops the stale identity from session.agents and clears session.agentToken/agentName only when the invalidated identity is the active one — mirrors the PR's routed-vs-active scoping. enableInboxPiggyback now catches thrown invalid-token errors and inspects successful tool-result bodies for the legacy marker, surfacing a structured isError: true recovery payload that points at register_agent. The opportunistic inbox fetch invalidates too. register_agent is exempt so it stays a clean recovery path.
MCP server: registerAgentWithRebind short-circuit now requires the strict-named identity to still be in session.agents, and prefers the per-identity token when present — so post-invalidation registrations rotate instead of handing back the dead token.
Broker (crates/broker/src/relaycast/auth.rs): AuthHttpError now carries the upstream code, relay_error_to_anyhow propagates it, and new is_agent_token_invalid / is_agent_token_invalid_anyhow / is_agent_token_invalid_code helpers + AGENT_TOKEN_INVALID_CODE constant let downstream Rust callers react to the same signal. The agent-create path also normalizes legacy 401+message responses to the typed code.

Test plan

npx vitest run packages/cli/src/cli/relaycast-mcp.test.ts packages/cli/src/cli/relaycast-mcp.startup.test.ts — 22 tests pass (3 new rebind tests, 19 pre-existing).
cd packages/sdk && npx vitest run src/__tests__/relaycast-errors.test.ts — 12 new detector tests pass.
cargo test -p agent-relay-broker --lib relaycast::auth::tests — 17 tests pass (3 new, 14 pre-existing).
cargo check -p agent-relay-broker — clean.
cd packages/cli && npx tsc --noEmit — clean.
Manual verification: run agent-relay mcp against a workspace, rotate an agent token out-of-band, and confirm the next tool call returns the agent_token_invalid recovery message and that calling register_agent rebinds without restart.

https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV

Generated by Claude Code

When a previously-issued Relaycast agent token is revoked or expires while the MCP server is running, the next upstream call returns 401 with code agent_token_invalid (per relaycast PR #137) or the legacy "Invalid agent token" message. Until now the session held the dead token forever and every tool kept failing. The MCP server (packages/cli/src/cli/relaycast-mcp.ts) now detects the condition on both thrown errors and tool-result bodies via shared detectors in @agent-relay/sdk, drops the stale identity from the per-name agents map (and clears the active token when the active identity is the invalidated one), surfaces a structured recovery response pointing at register_agent, and lets strict-named sessions re-register without a process restart. The broker mirrors the contract: AuthHttpError carries the upstream RelayError code through relay_error_to_anyhow, the agent-create path normalizes legacy 401+message responses to the typed agent_token_invalid code, and new helpers (is_agent_token_invalid, is_agent_token_invalid_anyhow, is_agent_token_invalid_code) let downstream Rust callers react to the same signal. https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV

coderabbitai · 2026-05-27T09:22:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9a2fc378-80ba-4f1c-b46a-3791700f3486

📥 Commits

Reviewing files that changed from the base of the PR and between 487f461 and e493804.

📒 Files selected for processing (4)

crates/broker/src/relaycast/auth.rs
packages/cli/src/cli/relaycast-mcp.ts
packages/sdk/src/__tests__/relaycast-errors.test.ts
packages/sdk/src/relaycast-errors.ts

🚧 Files skipped from review as they are similar to previous changes (3)

packages/sdk/src/relaycast-errors.ts
crates/broker/src/relaycast/auth.rs
packages/cli/src/cli/relaycast-mcp.ts

📝 Walkthrough

Walkthrough

End-to-end invalid-agent-token detection and mid-session recovery: SDK provides detection and recovery messaging; broker preserves and normalizes Relay API error codes and exposes Rust helpers; CLI caches per-identity tokens, invalidates stale tokens, and triggers re-registration via inbox piggyback.

Changes

Agent Token Invalidation Recovery

Layer / File(s)	Summary
SDK error detection and recovery utilities `packages/sdk/src/relaycast-errors.ts`, `packages/sdk/src/index.ts`, `packages/sdk/src/__tests__/relaycast-errors.test.ts`	Introduces `INVALID_AGENT_TOKEN_CODE`, `INVALID_AGENT_TOKEN_MESSAGE`, `isInvalidAgentTokenError`, `isInvalidAgentTokenToolResult`, and `agentTokenRecoveryMessage`; normalizes detection across typed codes, legacy 401+message pairs, nested envelopes, and cause chains. Tests validate detection and recovery message content.
Broker invalid-token detection and error normalization `crates/broker/src/relaycast/auth.rs`, `crates/broker/src/relaycast/mod.rs`	Adds `AuthHttpError.code`, `AGENT_TOKEN_INVALID_CODE`, `is_agent_token_invalid_code`, `is_agent_token_invalid`, `is_agent_token_invalid_anyhow`; preserves Relay API `code` in `relay_error_to_anyhow`; normalizes invalid-token failures during registration and includes unit tests.
CLI token invalidation, per-identity caching, and rebind logic `packages/cli/src/cli/relaycast-mcp.ts`, `packages/cli/src/cli/relaycast-mcp.test.ts`	Adds optional per-identity `agents` map to `RegistrationSession`; `registerAgentWithRebind` prefers cached identity tokens when valid; introduces `invalidateAgentToken(asIdentity?)` to clear stale tokens and updates `enableInboxPiggyback` to detect invalid-token tool results/exceptions and append recovery messaging. Tests cover rebind and token selection scenarios.
Changelog documentation `CHANGELOG.md`	Documents the new agent token recovery behavior for agent-relay MCP, `@agent-relay/sdk`, and agent-relay-broker under Unreleased/Added.

Sequence Diagram

sequenceDiagram
  participant ToolHandler as Tool Handler
  participant Inbox as enableInboxPiggyback
  participant InvalidCheck as isInvalidAgentTokenError
  participant TokenCache as session.agents
  participant RecoveryMsg as agentTokenRecoveryMessage
  ToolHandler->>Inbox: invoke tool with as identity
  Inbox->>InvalidCheck: detect invalid-token in result/exception
  InvalidCheck-->>Inbox: invalid token found
  Inbox->>TokenCache: invalidateAgentToken(asIdentity)
  TokenCache->>TokenCache: remove identity from cache, clear session
  Inbox->>RecoveryMsg: append recovery guidance
  RecoveryMsg-->>Inbox: register_agent instruction
  Inbox->>ToolHandler: return result with recovery message

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Relaycast MCP pre-registered agent token can become invalid mid-session relaycast#136: Directly implements detection and recovery for mid-session "Invalid agent token" failures by adding SDK/broker/CLI helpers and normalization.

Suggested reviewers

khaliqgant

Poem

🐰 Hop along the token trail so bright,
SDK listens, broker sets things right,
CLI clears the stale and makes a new start,
Re-register, retry — a fresh beating heart,
Hops of joy as tokens come apart.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: token recovery for agent sessions mid-operation, directly reflected in the core logic changes across SDK, MCP server, and broker.
Description check	✅ Passed	The description comprehensively covers all major changes (SDK, MCP server, broker), includes implementation details, and provides evidence of testing. However, manual verification is marked incomplete, and the template's 'Test Plan' section uses checkboxes.
Docstring Coverage	✅ Passed	Docstring coverage is 95.65% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/nice-turing-g05t2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces robust recovery mechanisms for handling stale or invalid Relaycast agent tokens mid-session across the MCP server, TypeScript SDK, and Rust broker. It implements structural detection helpers to recognize both the new typed agent_token_invalid error code and legacy 401 status/message pairs, clearing dead tokens and guiding clients to re-register. Feedback on the implementation suggests using a WeakSet in the TypeScript SDK's error detector to guard against potential infinite recursion from cyclic error cause chains, and trimming the error code string in the Rust broker's comparison helper to align with the TypeScript SDK's behavior.

gemini-code-assist · 2026-05-27T09:23:03Z

+export function isInvalidAgentTokenError(error: unknown): boolean {
+  if (!error || typeof error !== 'object') return false;
+  const err = error as MaybeError;
+
+  if (normalizeCode(err.code) === INVALID_AGENT_TOKEN_CODE) return true;
+
+  const bodyError = readBodyError(err.body);
+  if (bodyError && normalizeCode(bodyError.code) === INVALID_AGENT_TOKEN_CODE) return true;
+
+  const status = readStatus(err.statusCode) ?? readStatus(err.status);
+  const message =
+    (typeof err.message === 'string' ? err.message.trim() : '') ||
+    (bodyError?.message?.trim() ?? '');
+  if (status === 401 && message === INVALID_AGENT_TOKEN_MESSAGE) return true;
+
+  if (err.cause && err.cause !== error) {
+    return isInvalidAgentTokenError(err.cause);
+  }
+  return false;
+}


To prevent potential infinite recursion and stack overflow crashes in environments with cyclic error cause chains (e.g., A.cause === B and B.cause === A), we can track visited errors using a WeakSet.

export function isInvalidAgentTokenError(error: unknown, visited = new WeakSet<object>()): boolean { if (!error || typeof error !== 'object') return false; if (visited.has(error)) return false; visited.add(error); const err = error as MaybeError; if (normalizeCode(err.code) === INVALID_AGENT_TOKEN_CODE) return true; const bodyError = readBodyError(err.body); if (bodyError && normalizeCode(bodyError.code) === INVALID_AGENT_TOKEN_CODE) return true; const status = readStatus(err.statusCode) ?? readStatus(err.status); const message = (typeof err.message === 'string' ? err.message.trim() : '') || (bodyError?.message?.trim() ?? ''); if (status === 401 && message === INVALID_AGENT_TOKEN_MESSAGE) return true; if (err.cause) { return isInvalidAgentTokenError(err.cause, visited); } return false; }

Fixed in e493804 — isInvalidAgentTokenError now threads a WeakSet through the recursive cause traversal so cyclic graphs terminate cleanly. Added two regression tests: one for a 2-node cycle with no marker (returns false), one for a cycle where one node carries the marker (still detects).

Generated by Claude Code

gemini-code-assist · 2026-05-27T09:23:04Z

+pub fn is_agent_token_invalid_code(code: &str) -> bool {
+    code.eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE)
+}


For consistency with the TypeScript SDK implementation (which trims whitespace from the error code), we should trim the code string in Rust as well before comparing it case-insensitively.

Suggested change

pub fn is_agent_token_invalid_code(code: &str) -> bool {

code.eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE)

}

pub fn is_agent_token_invalid_code(code: &str) -> bool {

code.trim().eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE)

}

Fixed in e493804 — is_agent_token_invalid_code now trims before eq_ignore_ascii_case, matching the TypeScript normalizeCode contract. The relay_error_to_anyhow conversion also trims when persisting code into AuthHttpError. Added two unit tests covering the trimmed-code and anyhow-flavored paths.

Generated by Claude Code

https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

packages/cli/src/cli/relaycast-mcp.ts (1)

15-19: ⚡ Quick win

Use the shared invalid-token code constant instead of a string literal.

Line 634 hardcodes 'agent_token_invalid', which can drift from the SDK contract you already import from the same package.

Proposed fix

 import {
+  INVALID_AGENT_TOKEN_CODE,
   agentTokenRecoveryMessage,
   isInvalidAgentTokenError,
   isInvalidAgentTokenToolResult,
 } from '`@agent-relay/sdk`';
@@
     structuredContent: {
-      error: { code: 'agent_token_invalid', message: text },
+      error: { code: INVALID_AGENT_TOKEN_CODE, message: text },
     },

Also applies to: 629-635

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/cli/relaycast-mcp.ts` around lines 15 - 19, Replace the
hardcoded string 'agent_token_invalid' with the shared invalid-token code
constant exported by `@agent-relay/sdk`: add the SDK's exported invalid-token
constant to the existing imports (alongside agentTokenRecoveryMessage,
isInvalidAgentTokenError, isInvalidAgentTokenToolResult) and use that constant
wherever 'agent_token_invalid' is currently compared/used (around the logic that
references isInvalidAgentTokenError / isInvalidAgentTokenToolResult and
agentTokenRecoveryMessage) so the code relies on the canonical SDK value instead
of a string literal.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/broker/src/relaycast/auth.rs`:
- Around line 210-212: Normalize token-invalid codes by trimming surrounding
whitespace before comparison and when storing: update
is_agent_token_invalid_code to call .trim() on the incoming code before
performing eq_ignore_ascii_case(AGENT_TOKEN_INVALID_CODE), and apply the same
normalization to the other token-invalid check(s) referenced around the 828–832
area so inputs like " agent_token_invalid " are detected; ensure any places that
persist or compare these codes also use .trim() (and keep eq_ignore_ascii_case
for case-insensitivity) for consistent behavior.

In `@packages/sdk/src/relaycast-errors.ts`:
- Around line 80-82: The recursion in isInvalidAgentTokenError traverses
err.cause without cycle detection and can infinite-loop on cyclic error graphs;
modify isInvalidAgentTokenError to track visited error objects (e.g., add an
optional visited Set parameter or use an iterative loop) and check the Set
before recursing into err.cause, adding each error to the Set when visited so
cycles (a.cause = b; b.cause = a) are detected and recursion stops safely.

---

Nitpick comments:
In `@packages/cli/src/cli/relaycast-mcp.ts`:
- Around line 15-19: Replace the hardcoded string 'agent_token_invalid' with the
shared invalid-token code constant exported by `@agent-relay/sdk`: add the SDK's
exported invalid-token constant to the existing imports (alongside
agentTokenRecoveryMessage, isInvalidAgentTokenError,
isInvalidAgentTokenToolResult) and use that constant wherever
'agent_token_invalid' is currently compared/used (around the logic that
references isInvalidAgentTokenError / isInvalidAgentTokenToolResult and
agentTokenRecoveryMessage) so the code relies on the canonical SDK value instead
of a string literal.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5617dbdb-31f6-4855-80d8-e56e7142f350

📥 Commits

Reviewing files that changed from the base of the PR and between 6a456b7 and b8b32bc.

📒 Files selected for processing (8)

CHANGELOG.md
crates/broker/src/relaycast/auth.rs
crates/broker/src/relaycast/mod.rs
packages/cli/src/cli/relaycast-mcp.test.ts
packages/cli/src/cli/relaycast-mcp.ts
packages/sdk/src/__tests__/relaycast-errors.test.ts
packages/sdk/src/index.ts
packages/sdk/src/relaycast-errors.ts

- isInvalidAgentTokenError: thread a WeakSet through the recursive `cause` traversal so cyclic graphs (a.cause = b; b.cause = a) terminate cleanly instead of blowing the stack. Two new tests cover both the no-match and match-inside-cycle paths. - is_agent_token_invalid_code (Rust): trim whitespace before the case-insensitive comparison so " agent_token_invalid " is detected, matching the TypeScript normalizeCode contract. Persist trimmed codes through relay_error_to_anyhow. Two new unit tests cover the trimmed and anyhow-flavored paths. - relaycast-mcp.ts: replace the hardcoded 'agent_token_invalid' literal with the imported INVALID_AGENT_TOKEN_CODE constant so the canonical SDK value is the single source of truth. https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV

willwashburn requested a review from khaliqgant as a code owner May 27, 2026 09:21

style: auto-format Rust code with cargo fmt

b8b32bc

gemini-code-assist Bot reviewed May 27, 2026

View reviewed changes

style: apply prettier to packages/sdk/src/relaycast-errors.ts

487f461

https://claude.ai/code/session_01TTpJAiAxsgxMqC6RSzMDsV

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread crates/broker/src/relaycast/auth.rs

Comment thread packages/sdk/src/relaycast-errors.ts Outdated

willwashburn merged commit 9fcc4f6 into main May 27, 2026
49 checks passed

willwashburn deleted the claude/nice-turing-g05t2 branch May 27, 2026 10:14

coderabbitai Bot mentioned this pull request Jun 3, 2026

Add Relaycast attribution and Agent Relay MCP tool events #1041

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mcp: recover from invalid Relaycast agent tokens mid-session#1001

mcp: recover from invalid Relaycast agent tokens mid-session#1001
willwashburn merged 4 commits into
mainfrom
claude/nice-turing-g05t2

willwashburn commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 27, 2026

Uh oh!

willwashburn May 27, 2026

Uh oh!

gemini-code-assist Bot May 27, 2026

Uh oh!

willwashburn May 27, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

willwashburn commented May 27, 2026

Summary

Test plan

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

willwashburn May 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

willwashburn May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 27, 2026 •

edited

Loading