key improvements#493
Conversation
- resolveAuthProfile(): match both '.clawdbot' and 'clawdbot' suffixes to handle OPENCLAW_HOME=/opt/clawdbot (no dot prefix) - setNestedValue(): reject __proto__/prototype/constructor keys (prototype pollution guard) - Add canonicalization matrix with 6 payload variants for WS auth debugging - Add per-field hash logging and raw challenge capture behind OPENCLAW_WS_DEBUG=1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After reading the actual server-side verifier (openclaw/openclaw src/gateway/device-auth.ts, src/infra/device-identity.ts), confirmed: - Server accepts both PEM and raw-base64url public keys - Server decodes signatures in both base64url and standard base64 - Payload format (v3|deviceId|...) matches our v3-default-ms exactly Changes: - clawdbot-v1 profile now uses raw-base64url + base64url (matches server's own signDevicePayload output) instead of PEM + base64 - Add self-verification diagnostic: verify signature locally before sending, check deviceId derivation, verify encode/decode round-trip - Helps identify if the issue is key mismatch vs payload mismatch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Clawdbot marketplace image may run an older gateway that only supports v2 device auth payloads (no platform/deviceFamily fields). The current server tries v3→v2 fallback, but older versions only have v2. Changes: - clawdbot-v1 profile now uses v2 payload as primary - Added v2 variants to canonicalization matrix (v2-default-ms, v2-default-sec, v2-no-token-ms) - Default profile still uses v3 (unchanged for standard OpenClaw) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add safe v3↔v2 auth payload fallback: on signature rejection, retry once with alternate payload version before giving up - Clean up diagnostic logging: consolidate to single production line, gate verbose output behind OPENCLAW_WS_DEBUG=1 - Add auth reject/fallback counters for observability - Add fallback conformance tests (signature reject → retry → success, and double-reject → fail after one fallback) - Add WS auth version-compat matrix to SKILL.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move clearConnectTimeout() from the unconditional connect-response path into each terminal branch (success, pairing rejection, final rejection). The fallback branch intentionally keeps the original 30s timeout alive so a hanging fallback connection still triggers the timeout instead of leaving the connect promise dangling forever. Fixes Devin review finding on PR #493. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Guard close/error handlers with `this.ws !== ws` to ignore events from superseded WebSocket instances. This fixes two Devin review findings: 1. Error handler could reject connect promise during fallback if the old WS emitted an error during close handshake. 2. Old WS close handler could stomp `authenticated = false` on the new connection if it fired after the fallback succeeded, plus trigger a spurious scheduleReconnect(). The `this.ws !== ws` pattern replaces the `fallbackInProgress` flag with a simpler, more robust guard. Also adds early `stopped` check in sendChatMessage to prevent reconnect after explicit disconnect. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| @@ -423,7 +566,10 @@ export class OpenClawGatewayClient { | |||
Check warning
Code scanning / CodeQL
Log injection Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
In general, to fix log injection you should sanitize any untrusted input before interpolating it into log messages. For plain-text logs, a common minimal approach is to strip carriage return and newline characters so an attacker cannot break out onto new log lines, and optionally remove other control characters. It is also useful to clearly delimit or quote user-provided values in logs.
For this specific issue, the best minimally invasive fix is to sanitize requestId right before it is used in the log statement. Since we only see the problematic log at line 688 and the value is only used for display to the user, we can create a sanitized version (e.g., safeRequestId) by converting requestId to a string and removing \r and \n. Then we use safeRequestId in the console.error message. This preserves existing functionality (the user still sees the same identifier for normal values) while preventing forged log lines from maliciously crafted requestId values. No new imports or external libraries are needed; we can rely on String() and .replace with a suitable regular expression.
Concretely:
- In
packages/openclaw/src/gateway.ts, inside theif (requestId) { ... }block near line 688, introduce aconst safeRequestId = String(requestId).replace(/[\r\n]/g, '');. - Replace the existing
console.errorline that logsrequestIdwith one that logssafeRequestIdinstead. - Optionally, this pattern can be reused elsewhere if you later decide to sanitize other logged values, but for this fix we constrain changes only to the provided snippet.
| @@ -685,12 +685,13 @@ | ||
| const requestId = errObj?.requestId ?? errObj?.request_id ?? ''; | ||
| console.error('[openclaw-ws] Pairing rejected — device is not paired with the OpenClaw gateway.'); | ||
| if (requestId) { | ||
| console.error(`[openclaw-ws] Approve this device: openclaw devices approve ${requestId}`); | ||
| const safeRequestId = String(requestId).replace(/[\r\n]/g, ''); | ||
| console.error(`[openclaw-ws] Approve this device: openclaw devices approve ${safeRequestId}`); | ||
| } | ||
| console.error(`[openclaw-ws] Device ID: ${this.device.deviceId.slice(0, 16)}...`); | ||
| const configHint = getWsAuthCompat() === 'clawdbot' | ||
| ? '~/.clawdbot/clawdbot.json' | ||
| : '~/.openclaw/openclaw.json'; | ||
| : '~/.openclaw/openclaw/openclaw.json'; | ||
| console.error(`[openclaw-ws] Ensure OPENCLAW_GATEWAY_TOKEN matches ${configHint} gateway.auth.token`); | ||
| this.pairingRejected = true; | ||
| } else if (isSignatureInvalid && !this.fallbackAttempted) { |
There was a problem hiding this comment.
🟡 Fallback state (payloadVersionOverride, fallbackAttempted) never resets during auto-reconnect, locking retries to one payload version
When both v3 and v2 payload versions fail authentication, the connect() promise is rejected but the WebSocket close handler still calls scheduleReconnect() (gateway.ts:564-565). scheduleReconnect invokes doConnect() directly (gateway.ts:812,826) which does NOT reset payloadVersionOverride or fallbackAttempted — those are only reset in connect() (gateway.ts:474-476). This means all subsequent auto-reconnect attempts are locked to the last-tried fallback payload version and never cycle back to the original. If the first version was rejected due to a transient server issue (and the isSignatureInvalid regex at line 680 is fairly broad — device.signature matches any error containing "device signature" even without "invalid"), the client would be permanently stuck retrying only the wrong payload version until an explicit connect() call is made.
(Refers to lines 809-813)
Prompt for agents
In packages/openclaw/src/gateway.ts, the scheduleReconnect() method (around lines 797-828) calls doConnect() directly, which does not reset the fallback state (payloadVersionOverride, fallbackAttempted). After both v3 and v2 fail, auto-reconnects are stuck on one version. Consider resetting these fields before calling doConnect() in scheduleReconnect(), similar to how connect() resets them at lines 474-476. For example, in the two setTimeout callbacks that call this.doConnect() (around lines 811-812 and 825-826), add:
this.payloadVersionOverride = null;
this.fallbackAttempted = false;
This way each auto-reconnect cycle gets a fresh chance to try the primary version first and fall back if needed.
Was this helpful? React with 👍 or 👎 to provide feedback.
| ws.on('message', (data) => { | ||
| this.handleMessage(data.toString()); | ||
| }); |
There was a problem hiding this comment.
🟡 Missing this.ws !== ws guard on WebSocket message handler allows stale messages during v3↔v2 fallback
The close and error handlers in doConnect() were given if (this.ws !== ws) return; guards (lines 535, 571) to ignore events from superseded WebSocket instances during v3↔v2 fallback. However, the message handler at line 528 was not given the same guard. During the fallback flow, this.ws is set to null and the old WebSocket is closed (gateway.ts:713-714), but if a buffered message is delivered on the old socket before the close completes, handleMessage would execute against the current (potentially new) connection state. For example, a stale connect.challenge from the old socket could cause a duplicate connect request to be sent on the newly created socket.
Inconsistent guard pattern in doConnect()
The close handler at line 532-567 and error handler at line 569-581 both check if (this.ws !== ws) return; but the message handler at line 528-530 does not:
ws.on('message', (data) => {
this.handleMessage(data.toString()); // no guard
});| ws.on('message', (data) => { | |
| this.handleMessage(data.toString()); | |
| }); | |
| ws.on('message', (data) => { | |
| // Guard: ignore messages from superseded WebSocket instances. | |
| if (this.ws !== ws) return; | |
| this.handleMessage(data.toString()); | |
| }); |
Was this helpful? React with 👍 or 👎 to provide feedback.
…blish wiring The Phase A primitive previously hard-failed when SLACK_BOT_TOKEN wasn't set. That breaks the realistic CLI flow: most users don't have a bot token locally — they have a relay session and a workspace that's already connected to Slack via ricky's Nango app. This adds two new runtimes alongside `local`: - `cloud-relay`: posts via relay-cloud's POST /api/v1/slack/post-message (cloud PR #493). Activated when CLOUD_API_TOKEN + CLOUD_API_URL are set. The cloud endpoint uses the workspace's existing Nango Slack connection — no per-user bot token needed. - `noop`: postMessage logs a warning and returns a placeholder ts. Activated when no tokens are set. Lets workflows run end-to-end in CI / smoke environments without hard-failing on missing Slack creds. Selection priority: cloud-relay → local → noop. Override with `runtime: 'local' | 'cloud-relay' | 'noop' | 'auto'`. In cloud-relay mode, resolveUser/resolveChannel throw `unsupported_in_cloud_relay` (Phase A intentionally exposes only postMessage). Mention resolution is local-only; cloud-relay surfaces unresolved mentions as warnings on the step output. Also wires slack-primitive as an SDK internal dep, mirroring the github primitive shape: - packages/sdk/package.json: dep + ./slack subpath export - packages/sdk/src/slack.ts: re-exports the full surface - packages/sdk/src/index.ts: `export * as slack` + curated `{ createSlackStep, SlackClient }` from the root - .github/workflows/publish.yml: pack + install in smoke build, publish-sdk-internal-deps matrix, dry-run loop, publish_if_missing chain - .github/workflows/verify-publish-sdk.yml: package-availability check Tests: 22 new vitest cases (cloud-relay 11, noop 3, runtime-selection 8) covering auth requirements, success path with thread_ts/unfurl forwarding, mention warnings, error mapping (rate_limited / not_connected / slack_error / upstream_error), resolve rejection, and the auto-detect priority order. Related: AgentWorkforce/cloud#493 (the cloud-side endpoint). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* spec(slack-primitive): design for workflow ↔ human messaging via local + cloud adapters
Mirrors the github-primitive's adapter shape so the same workflow file
runs locally (gh CLI / Slack token) and in cloud (Nango). Two
flagship verbs:
- postMessage — fire-and-forget human notification (PR opened,
workflow done, etc.). Resolves @-mentions and #channel names at
step time.
- askQuestion — block the workflow on a human reply. Configurable
timeout, replyMatch (regex/choice/any), allowedReplyFrom, and
Block Kit interactive forms. Resumable across sandbox restarts
via run-record metadata so retries don't re-ask.
Captures the cultural change the primitive enables: agents should ask
for clarification when blocked rather than guess. Includes two
recipes ("Announce + Done", "Ask Before You Guess") that the
writing-agent-relay-workflows skill will pick up when v1 ships.
Cloud-runtime auth reuses the workspace's existing Nango Slack
connection — no new SST resource bindings. Slack tokens don't rotate
per-call, so the proxy form (nango.proxy({ endpoint: '/chat.postMessage' }))
is the right shape — avoids the get-token-vs-proxy confusion the
github-primitive had to work through.
Phasing:
A — postMessage + resolvers
B — askQuestion (blocking, resumable)
C — Block Kit interactive forms + utility verbs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* trajectories cleanup
* feat(sdk): re-export github primitive from root entry
Exposes the bundled `@agent-relay/github-primitive` from the root
`@agent-relay/sdk` so workflow authors no longer need the subpath
import. Two new shapes added alongside the existing `/github` subpath:
- `import { github } from '@agent-relay/sdk'` — full namespaced surface,
no collision risk with other root exports.
- `import { createGitHubStep, GitHubClient } from '@agent-relay/sdk'` —
curated helpers for the common workflow-author path.
Avoided a flat `export *` because the primitive ships ~40 generic-named
action helpers (createFile, readFile, getUser, errorMessage, ...) that
would pollute the root namespace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(specs): resolve open questions in slack-primitive spec
Walked through §8 open questions and folded answers into the body:
- DM support deferred to v2 (limitation noted in §4.3)
- Slack Connect verification moved to Phase A implementation work
- Audit-trail persistence assigned to the runner; tracked as #825
- channel optional for postMessage (reuses sage notify-channel resolver),
required for askQuestion
- Retry idempotency keyed on (runId, stepName); folded into §4.3 resumability
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: address PR #823 review comments
CodeRabbit flagged privacy/portability issues on the trajectory artifacts and
two minor formatting issues on the Slack primitive spec.
- Sanitize machine-specific paths in .trajectories/* — replace
/Users/khaliqgant/... and /Users/will/... with <repo-root>/<home>
placeholders. Path fields in .trajectories/index.json normalized to
repo-relative paths so lookup works across environments.
- specs/slack-primitive.md: wrap token-prefix examples in backticks
(xoxb-* / xoxp-*) so markdown doesn't eat the asterisks.
- specs/slack-primitive.md: add 'text' language tag to the package-tree
fenced code block so markdownlint MD040 stops complaining.
Skipped: the "use GitHub casing" comment on the compacted .md — it triggered
on the literal '.github/workflows/' directory path, not the product name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(slack-primitive): Phase A implementation (postMessage + resolvers)
Implements packages/slack-primitive following the design spec at
specs/slack-primitive.md and the implementation prompt at
specs/slack-primitive-impl.md. Phase A only — local Web API runtime,
postMessage + resolveUser + resolveChannel.
- Local runtime via @slack/web-api, SLACK_BOT_TOKEN env auth.
- createSlackStep workflow helper with postMessage action.
- Mention resolution: emails via users.lookupByEmail, handles via
user-cache, raw IDs pass through. Unresolved mentions logged as soft
error on step output.
- Channel name resolution via conversations.list; IDs pass through.
- Example workflow + smoke-test docs in examples/.
- Unit tests cover token-missing, channel resolution, mention success
and soft-fail, and templating substitution.
Out of scope: askQuestion, alternate-runtime adapter, Block Kit and
utility verbs, runner schema for askQuestion audit trail (#825).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(slack-primitive): cloud-relay + noop runtimes, SDK re-export, publish wiring
The Phase A primitive previously hard-failed when SLACK_BOT_TOKEN wasn't
set. That breaks the realistic CLI flow: most users don't have a bot
token locally — they have a relay session and a workspace that's already
connected to Slack via ricky's Nango app.
This adds two new runtimes alongside `local`:
- `cloud-relay`: posts via relay-cloud's POST /api/v1/slack/post-message
(cloud PR #493). Activated when CLOUD_API_TOKEN + CLOUD_API_URL are
set. The cloud endpoint uses the workspace's existing Nango Slack
connection — no per-user bot token needed.
- `noop`: postMessage logs a warning and returns a placeholder ts.
Activated when no tokens are set. Lets workflows run end-to-end in
CI / smoke environments without hard-failing on missing Slack creds.
Selection priority: cloud-relay → local → noop. Override with
`runtime: 'local' | 'cloud-relay' | 'noop' | 'auto'`.
In cloud-relay mode, resolveUser/resolveChannel throw
`unsupported_in_cloud_relay` (Phase A intentionally exposes only
postMessage). Mention resolution is local-only; cloud-relay surfaces
unresolved mentions as warnings on the step output.
Also wires slack-primitive as an SDK internal dep, mirroring the github
primitive shape:
- packages/sdk/package.json: dep + ./slack subpath export
- packages/sdk/src/slack.ts: re-exports the full surface
- packages/sdk/src/index.ts: `export * as slack` + curated
`{ createSlackStep, SlackClient }` from the root
- .github/workflows/publish.yml: pack + install in smoke build,
publish-sdk-internal-deps matrix, dry-run loop, publish_if_missing chain
- .github/workflows/verify-publish-sdk.yml: package-availability check
Tests: 22 new vitest cases (cloud-relay 11, noop 3, runtime-selection 8)
covering auth requirements, success path with thread_ts/unfurl
forwarding, mention warnings, error mapping (rate_limited / not_connected
/ slack_error / upstream_error), resolve rejection, and the auto-detect
priority order.
Related: AgentWorkforce/cloud#493 (the cloud-side endpoint).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(slack-primitive): address PR checks and comments
* chore: complete PR 823 fix trajectory
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Test Plan
Screenshots