key improvements by khaliqgant · Pull Request #493 · AgentWorkforce/relay

khaliqgant · 2026-03-05T08:18:39Z

Summary

Test Plan

Tests added/updated
Manual testing completed

Screenshots

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

- resolveAuthProfile(): match both '.clawdbot' and 'clawdbot' suffixes to handle OPENCLAW_HOME=/opt/clawdbot (no dot prefix) - setNestedValue(): reject __proto__/prototype/constructor keys (prototype pollution guard) - Add canonicalization matrix with 6 payload variants for WS auth debugging - Add per-field hash logging and raw challenge capture behind OPENCLAW_WS_DEBUG=1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

After reading the actual server-side verifier (openclaw/openclaw src/gateway/device-auth.ts, src/infra/device-identity.ts), confirmed: - Server accepts both PEM and raw-base64url public keys - Server decodes signatures in both base64url and standard base64 - Payload format (v3|deviceId|...) matches our v3-default-ms exactly Changes: - clawdbot-v1 profile now uses raw-base64url + base64url (matches server's own signDevicePayload output) instead of PEM + base64 - Add self-verification diagnostic: verify signature locally before sending, check deviceId derivation, verify encode/decode round-trip - Helps identify if the issue is key mismatch vs payload mismatch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The Clawdbot marketplace image may run an older gateway that only supports v2 device auth payloads (no platform/deviceFamily fields). The current server tries v3→v2 fallback, but older versions only have v2. Changes: - clawdbot-v1 profile now uses v2 payload as primary - Added v2 variants to canonicalization matrix (v2-default-ms, v2-default-sec, v2-no-token-ms) - Default profile still uses v3 (unchanged for standard OpenClaw) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add safe v3↔v2 auth payload fallback: on signature rejection, retry once with alternate payload version before giving up - Clean up diagnostic logging: consolidate to single production line, gate verbose output behind OPENCLAW_WS_DEBUG=1 - Add auth reject/fallback counters for observability - Add fallback conformance tests (signature reject → retry → success, and double-reject → fail after one fallback) - Add WS auth version-compat matrix to SKILL.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move clearConnectTimeout() from the unconditional connect-response path into each terminal branch (success, pairing rejection, final rejection). The fallback branch intentionally keeps the original 30s timeout alive so a hanging fallback connection still triggers the timeout instead of leaving the connect promise dangling forever. Fixes Devin review finding on PR #493. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Guard close/error handlers with `this.ws !== ws` to ignore events from superseded WebSocket instances. This fixes two Devin review findings: 1. Error handler could reject connect promise during fallback if the old WS emitted an error during close handshake. 2. Old WS close handler could stomp `authenticated = false` on the new connection if it fired after the fallback succeeded, plus trigger a spurious scheduleReconnect(). The `this.ws !== ws` pattern replaces the `fallbackInProgress` flag with a simpler, more robust guard. Also adds early `stopped` check in sendChatMessage to prevent reconnect after explicit disconnect. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In general, to fix log injection you should sanitize any untrusted input before interpolating it into log messages. For plain-text logs, a common minimal approach is to strip carriage return and newline characters so an attacker cannot break out onto new log lines, and optionally remove other control characters. It is also useful to clearly delimit or quote user-provided values in logs.

For this specific issue, the best minimally invasive fix is to sanitize requestId right before it is used in the log statement. Since we only see the problematic log at line 688 and the value is only used for display to the user, we can create a sanitized version (e.g., safeRequestId) by converting requestId to a string and removing \r and \n. Then we use safeRequestId in the console.error message. This preserves existing functionality (the user still sees the same identifier for normal values) while preventing forged log lines from maliciously crafted requestId values. No new imports or external libraries are needed; we can rely on String() and .replace with a suitable regular expression.

Concretely:

In packages/openclaw/src/gateway.ts, inside the if (requestId) { ... } block near line 688, introduce a const safeRequestId = String(requestId).replace(/[\r\n]/g, '');.

Replace the existing console.error line that logs requestId with one that logs safeRequestId instead.

Optionally, this pattern can be reused elsewhere if you later decide to sanitize other logged values, but for this fix we constrain changes only to the provided snippet.

devin-ai-integration

Devin Review found 2 new potential issues.

View 12 additional findings in Devin Review.

devin-ai-integration · 2026-03-05T09:24:30Z

🟡 Fallback state (payloadVersionOverride, fallbackAttempted) never resets during auto-reconnect, locking retries to one payload version

When both v3 and v2 payload versions fail authentication, the connect() promise is rejected but the WebSocket close handler still calls scheduleReconnect() (gateway.ts:564-565). scheduleReconnect invokes doConnect() directly (gateway.ts:812,826) which does NOT reset payloadVersionOverride or fallbackAttempted — those are only reset in connect() (gateway.ts:474-476). This means all subsequent auto-reconnect attempts are locked to the last-tried fallback payload version and never cycle back to the original. If the first version was rejected due to a transient server issue (and the isSignatureInvalid regex at line 680 is fairly broad — device.signature matches any error containing "device signature" even without "invalid"), the client would be permanently stuck retrying only the wrong payload version until an explicit connect() call is made.

(Refers to lines 809-813)

Prompt for agents

In packages/openclaw/src/gateway.ts, the scheduleReconnect() method (around lines 797-828) calls doConnect() directly, which does not reset the fallback state (payloadVersionOverride, fallbackAttempted). After both v3 and v2 fail, auto-reconnects are stuck on one version. Consider resetting these fields before calling doConnect() in scheduleReconnect(), similar to how connect() resets them at lines 474-476. For example, in the two setTimeout callbacks that call this.doConnect() (around lines 811-812 and 825-826), add: this.payloadVersionOverride = null; this.fallbackAttempted = false; This way each auto-reconnect cycle gets a fresh chance to try the primary version first and fall back if needed.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-05T09:24:32Z

+    ws.on('message', (data) => {
      this.handleMessage(data.toString());
    });


🟡 Missing this.ws !== ws guard on WebSocket message handler allows stale messages during v3↔v2 fallback

The close and error handlers in doConnect() were given if (this.ws !== ws) return; guards (lines 535, 571) to ignore events from superseded WebSocket instances during v3↔v2 fallback. However, the message handler at line 528 was not given the same guard. During the fallback flow, this.ws is set to null and the old WebSocket is closed (gateway.ts:713-714), but if a buffered message is delivered on the old socket before the close completes, handleMessage would execute against the current (potentially new) connection state. For example, a stale connect.challenge from the old socket could cause a duplicate connect request to be sent on the newly created socket.

Inconsistent guard pattern in doConnect()

The close handler at line 532-567 and error handler at line 569-581 both check if (this.ws !== ws) return; but the message handler at line 528-530 does not:

ws.on('message', (data) => { this.handleMessage(data.toString()); // no guard });

Suggested change

ws.on('message', (data) => {

this.handleMessage(data.toString());

});

ws.on('message', (data) => {

// Guard: ignore messages from superseded WebSocket instances.

if (this.ws !== ws) return;

this.handleMessage(data.toString());

});

Was this helpful? React with 👍 or 👎 to provide feedback.

…blish wiring The Phase A primitive previously hard-failed when SLACK_BOT_TOKEN wasn't set. That breaks the realistic CLI flow: most users don't have a bot token locally — they have a relay session and a workspace that's already connected to Slack via ricky's Nango app. This adds two new runtimes alongside `local`: - `cloud-relay`: posts via relay-cloud's POST /api/v1/slack/post-message (cloud PR #493). Activated when CLOUD_API_TOKEN + CLOUD_API_URL are set. The cloud endpoint uses the workspace's existing Nango Slack connection — no per-user bot token needed. - `noop`: postMessage logs a warning and returns a placeholder ts. Activated when no tokens are set. Lets workflows run end-to-end in CI / smoke environments without hard-failing on missing Slack creds. Selection priority: cloud-relay → local → noop. Override with `runtime: 'local' | 'cloud-relay' | 'noop' | 'auto'`. In cloud-relay mode, resolveUser/resolveChannel throw `unsupported_in_cloud_relay` (Phase A intentionally exposes only postMessage). Mention resolution is local-only; cloud-relay surfaces unresolved mentions as warnings on the step output. Also wires slack-primitive as an SDK internal dep, mirroring the github primitive shape: - packages/sdk/package.json: dep + ./slack subpath export - packages/sdk/src/slack.ts: re-exports the full surface - packages/sdk/src/index.ts: `export * as slack` + curated `{ createSlackStep, SlackClient }` from the root - .github/workflows/publish.yml: pack + install in smoke build, publish-sdk-internal-deps matrix, dry-run loop, publish_if_missing chain - .github/workflows/verify-publish-sdk.yml: package-availability check Tests: 22 new vitest cases (cloud-relay 11, noop 3, runtime-selection 8) covering auth requirements, success path with thread_ts/unfurl forwarding, mention warnings, error mapping (rate_limited / not_connected / slack_error / upstream_error), resolve rejection, and the auto-detect priority order. Related: AgentWorkforce/cloud#493 (the cloud-side endpoint). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* spec(slack-primitive): design for workflow ↔ human messaging via local + cloud adapters Mirrors the github-primitive's adapter shape so the same workflow file runs locally (gh CLI / Slack token) and in cloud (Nango). Two flagship verbs: - postMessage — fire-and-forget human notification (PR opened, workflow done, etc.). Resolves @-mentions and #channel names at step time. - askQuestion — block the workflow on a human reply. Configurable timeout, replyMatch (regex/choice/any), allowedReplyFrom, and Block Kit interactive forms. Resumable across sandbox restarts via run-record metadata so retries don't re-ask. Captures the cultural change the primitive enables: agents should ask for clarification when blocked rather than guess. Includes two recipes ("Announce + Done", "Ask Before You Guess") that the writing-agent-relay-workflows skill will pick up when v1 ships. Cloud-runtime auth reuses the workspace's existing Nango Slack connection — no new SST resource bindings. Slack tokens don't rotate per-call, so the proxy form (nango.proxy({ endpoint: '/chat.postMessage' })) is the right shape — avoids the get-token-vs-proxy confusion the github-primitive had to work through. Phasing: A — postMessage + resolvers B — askQuestion (blocking, resumable) C — Block Kit interactive forms + utility verbs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * trajectories cleanup * feat(sdk): re-export github primitive from root entry Exposes the bundled `@agent-relay/github-primitive` from the root `@agent-relay/sdk` so workflow authors no longer need the subpath import. Two new shapes added alongside the existing `/github` subpath: - `import { github } from '@agent-relay/sdk'` — full namespaced surface, no collision risk with other root exports. - `import { createGitHubStep, GitHubClient } from '@agent-relay/sdk'` — curated helpers for the common workflow-author path. Avoided a flat `export *` because the primitive ships ~40 generic-named action helpers (createFile, readFile, getUser, errorMessage, ...) that would pollute the root namespace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(specs): resolve open questions in slack-primitive spec Walked through §8 open questions and folded answers into the body: - DM support deferred to v2 (limitation noted in §4.3) - Slack Connect verification moved to Phase A implementation work - Audit-trail persistence assigned to the runner; tracked as #825 - channel optional for postMessage (reuses sage notify-channel resolver), required for askQuestion - Retry idempotency keyed on (runId, stepName); folded into §4.3 resumability Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: address PR #823 review comments CodeRabbit flagged privacy/portability issues on the trajectory artifacts and two minor formatting issues on the Slack primitive spec. - Sanitize machine-specific paths in .trajectories/* — replace /Users/khaliqgant/... and /Users/will/... with <repo-root>/<home> placeholders. Path fields in .trajectories/index.json normalized to repo-relative paths so lookup works across environments. - specs/slack-primitive.md: wrap token-prefix examples in backticks (xoxb-* / xoxp-*) so markdown doesn't eat the asterisks. - specs/slack-primitive.md: add 'text' language tag to the package-tree fenced code block so markdownlint MD040 stops complaining. Skipped: the "use GitHub casing" comment on the compacted .md — it triggered on the literal '.github/workflows/' directory path, not the product name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(slack-primitive): Phase A implementation (postMessage + resolvers) Implements packages/slack-primitive following the design spec at specs/slack-primitive.md and the implementation prompt at specs/slack-primitive-impl.md. Phase A only — local Web API runtime, postMessage + resolveUser + resolveChannel. - Local runtime via @slack/web-api, SLACK_BOT_TOKEN env auth. - createSlackStep workflow helper with postMessage action. - Mention resolution: emails via users.lookupByEmail, handles via user-cache, raw IDs pass through. Unresolved mentions logged as soft error on step output. - Channel name resolution via conversations.list; IDs pass through. - Example workflow + smoke-test docs in examples/. - Unit tests cover token-missing, channel resolution, mention success and soft-fail, and templating substitution. Out of scope: askQuestion, alternate-runtime adapter, Block Kit and utility verbs, runner schema for askQuestion audit trail (#825). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(slack-primitive): cloud-relay + noop runtimes, SDK re-export, publish wiring The Phase A primitive previously hard-failed when SLACK_BOT_TOKEN wasn't set. That breaks the realistic CLI flow: most users don't have a bot token locally — they have a relay session and a workspace that's already connected to Slack via ricky's Nango app. This adds two new runtimes alongside `local`: - `cloud-relay`: posts via relay-cloud's POST /api/v1/slack/post-message (cloud PR #493). Activated when CLOUD_API_TOKEN + CLOUD_API_URL are set. The cloud endpoint uses the workspace's existing Nango Slack connection — no per-user bot token needed. - `noop`: postMessage logs a warning and returns a placeholder ts. Activated when no tokens are set. Lets workflows run end-to-end in CI / smoke environments without hard-failing on missing Slack creds. Selection priority: cloud-relay → local → noop. Override with `runtime: 'local' | 'cloud-relay' | 'noop' | 'auto'`. In cloud-relay mode, resolveUser/resolveChannel throw `unsupported_in_cloud_relay` (Phase A intentionally exposes only postMessage). Mention resolution is local-only; cloud-relay surfaces unresolved mentions as warnings on the step output. Also wires slack-primitive as an SDK internal dep, mirroring the github primitive shape: - packages/sdk/package.json: dep + ./slack subpath export - packages/sdk/src/slack.ts: re-exports the full surface - packages/sdk/src/index.ts: `export * as slack` + curated `{ createSlackStep, SlackClient }` from the root - .github/workflows/publish.yml: pack + install in smoke build, publish-sdk-internal-deps matrix, dry-run loop, publish_if_missing chain - .github/workflows/verify-publish-sdk.yml: package-availability check Tests: 22 new vitest cases (cloud-relay 11, noop 3, runtime-selection 8) covering auth requirements, success path with thread_ts/unfurl forwarding, mention warnings, error mapping (rate_limited / not_connected / slack_error / upstream_error), resolve rejection, and the auto-detect priority order. Related: AgentWorkforce/cloud#493 (the cloud-side endpoint). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(slack-primitive): address PR checks and comments * chore: complete PR 823 fix trajectory --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

key improvements

ebb9a17

khaliqgant requested a review from willwashburn as a code owner March 5, 2026 08:18

devin-ai-integration Bot reviewed Mar 5, 2026

View reviewed changes

build update

c6730af

This comment was marked as resolved.

Sign in to view

khaliqgant and others added 4 commits March 5, 2026 00:35

This comment was marked as resolved.

Sign in to view

github-advanced-security AI found potential problems Mar 5, 2026

View reviewed changes

devin-ai-integration Bot reviewed Mar 5, 2026

View reviewed changes

khaliqgant merged commit 8c81aeb into main Mar 5, 2026
44 of 45 checks passed

khaliqgant deleted the pr-followup branch March 5, 2026 21:21

@@ -685,12 +685,13 @@
                       const requestId = errObj?.requestId ?? errObj?.request_id ?? '';
                       console.error('[openclaw-ws] Pairing rejected — device is not paired with the OpenClaw gateway.');
                       if (requestId) {
-                        console.error(`[openclaw-ws] Approve this device:  openclaw devices approve ${requestId}`);
+                        const safeRequestId = String(requestId).replace(/[\r\n]/g, '');
+                        console.error(`[openclaw-ws] Approve this device:  openclaw devices approve ${safeRequestId}`);
                       }
                       console.error(`[openclaw-ws] Device ID: ${this.device.deviceId.slice(0, 16)}...`);
                       const configHint = getWsAuthCompat() === 'clawdbot'
                         ? '~/.clawdbot/clawdbot.json'
-                        : '~/.openclaw/openclaw.json';
+                        : '~/.openclaw/openclaw/openclaw.json';
                       console.error(`[openclaw-ws] Ensure OPENCLAW_GATEWAY_TOKEN matches ${configHint} gateway.auth.token`);
                       this.pairingRejected = true;
                     } else if (isSignatureInvalid && !this.fallbackAttempted) {

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

key improvements#493

key improvements#493
khaliqgant merged 8 commits into
mainfrom
pr-followup

khaliqgant commented Mar 5, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Check warning

Copilot Autofix

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Mar 5, 2026

Uh oh!

devin-ai-integration Bot Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

khaliqgant commented Mar 5, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Screenshots

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Check warning

Uh oh!

Copilot Autofix

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

khaliqgant commented Mar 5, 2026 •

edited by devin-ai-integration Bot

Loading