fix(slack): normalise outbound URLs at boundary — closes #1092, partial #850#1102
Conversation
… partial netclaw-dev#850 netclaw-dev#1092 — Slack click redirector and the LLM's URL-encoding of '+' to '%2B' inside markdown links combine to corrupt OAuth scope lists. netclaw-dev#850 — Standard markdown links '[text](url)' don't render as links in Slack at all; only Slack-native '<url|text>' does. This commit introduces SlackTextProtector and threads it through the two outbound Slack clients. It also updates SlackBlockConverter so the Block Kit and plain-text fallback paths stay in lockstep. Pieces of the fix: * NormaliseScopeList(url) — when a URL has a 'scope=' query parameter whose value contains two or more '%2B' separators, decode them back to '+'. This is the LLM-introduced corruption pattern (Claude re-encoding the literal '+' delimiter when wrapping the URL in a markdown link); restoring '+' is what the IdP expects. Conservative — a single '%2B' is left alone (likely legitimate). * IsRewriteProne(url) — flag URLs containing literal '+' or '%2B'. Slack rewrites these on click via its link redirector, regardless of whether they reach Slack as plain text, mrkdwn '<url>' or a Block Kit RichTextLink. The only reliable way to preserve such a URL is to render it non-clickable, so the user copies it exact. * ProtectUrls(text) — applied to the Text field of every outbound chat.postMessage. Converts markdown '[label](url)' into Slack- native '<url|label>' for safe URLs (addresses netclaw-dev#850's clickable- link requirement). Rewrite-prone URLs are normalised (above) and rendered as inline code; the label is dropped because the URL has to be the visible payload. * SlackBlockConverter calls the same predicate and normaliser so that Block Kit RichTextLink is emitted for safe URLs and inline- code RichTextText for rewrite-prone URLs. The two paths agree. Tests (Netclaw.Actors.Tests/Channels/SlackTextProtectorTests.cs): * 19 xUnit facts covering bare URLs, markdown links, the OAuth-scope bug-of-record (both '+' and '%2B' shapes), single '%2B' negative- case, '%20' negative-case, prose parentheses, existing-wrap pass- through (both '<...>' and '`...`'), http:// callbacks, the realistic 22-scope Google URL, plus parameterised theories for IsRewriteProne and NormaliseScopeList. * Verified failing-then-passing locally: with NormaliseScopeList and IsRewriteProne removed, the rewrite-prone cases fail with the exact diff the bug shows; with the helpers in place, all 50 facts in Netclaw.Actors.Tests/Channels/ pass. End-to-end verification: * Doist + Anthropic claude-sonnet-4-5 + taylorwilsdon/google_workspace_mcp. Before this commit: bot DM'd a Google OAuth URL via markdown link that Slack rendered as raw '[Authorization URL](url-with-%2B)' text; copy or click both produced 'Error 400: invalid_scope' from Google. * After this commit: rewrite-prone URL is decoded to the original '+' form and rendered as inline code (gray, non-clickable); manual copy out of the code element delivers the URL byte-exact to the browser and Google accepts the full 22-scope list. Known limitation (intentional): * Rewrite-prone URLs are non-clickable. This is a tradeoff — Slack's link redirector cannot be told to skip a particular URL, so the only way to preserve a URL containing '+' / '%2B' is to render it as inline code. netclaw-dev#850's clickable-link requirement is addressed for safe URLs but is intentionally not met for rewrite- prone ones. A follow-up could introduce a local HTTP redirector that hands Slack a short opaque URL which 302s to the real URL — that's the only path I can see to clickable + preserved.
efec5f2 to
b77a4db
Compare
Addresses maintainer review of netclaw-dev#1102: - BareUrlRegex no longer swallows trailing sentence punctuation (.,!?;:). On the new plain-text Text-field path, "see https://x.com." would otherwise wrap the period inside the clickable link target. Only the final character is constrained, so punctuation inside the URL path is still preserved. - Consolidate the two divergent BareUrlRegex definitions. SlackBlockConverter now consumes the shared (internal) SlackTextProtector.BareUrlRegex so the Block Kit and plain-text surfaces tokenize URLs identically, as the PR intends. - NormaliseScopeList anchors 'scope=' to a query-parameter boundary (? or &) so a 'scope=' substring inside a path segment or a longer parameter name (e.g. 'myscope=') cannot be mis-targeted by the decode. Adds regression tests on both the Text and Block Kit surfaces.
|
Thanks for this @johnkattenhorn — solid fix, and the writeup made the two failure modes easy to follow. I gave it a full review and pushed a follow-up commit (62c0c03) directly to the branch to clear the should-fix items rather than ping-pong on them: 1. Bare-URL regex swallowed trailing sentence punctuation. 2. Two divergent 3. Regression tests added on both the Text and Block Kit surfaces. Locally: 63/63 channel tests pass, Two pre-existing nits I deliberately left alone — not introduced here, inherited from
The partial #850 close is well-reasoned and clearly documented — no objection to deferring the rest. LGTM once CI is green. |
Closes #1107. Follow-up to #1102. The markdown-link regex \[([^\]]+)\]\(([^)]+)\) truncated any link destination containing a ')'. A URL like https://en.wikipedia.org/wiki/Foo_(disambiguation) was cut at the first ')', and the remainder leaked into the message as stray text. The url group now accepts balanced parenthesised segments — (?:[^()]|\([^()]*\))* — so a ')' only closes the markdown link when it is not part of a balanced pair (CommonMark link-destination semantics). One level of paren nesting is supported, which covers every URL shape seen in practice. Also consolidates the two duplicated markdown-link regexes: SlackBlockConverter now consumes the shared (internal) SlackTextProtector.MarkdownLinkRegex, mirroring the bare-URL regex consolidation from #1102. Regression tests added on both the Text and Block Kit surfaces.
Addresses #1092 in full and partially closes #850.
Two bugs, one outbound-Slack boundary
#1092 — Slack rewrites OAuth scope delimiters on click.
The LLM's response sometimes carries the URL inside a markdown link
[label](url)and URL-encodes the literal+(scope-list separator) into%2Balong the way. Whether the URL reaches Slack as plain text, mrkdwn<url>, or Block KitRichTextLink, Slack's click redirector re-encodes+to%2Bon click, so the final URL the IdP sees has every scope concatenated into one invalid string. Repro: Doist +taylorwilsdon/google_workspace_mcp+ claude-sonnet-4-5 → click →Error 400: invalid_scope.#850 — Standard markdown links don't render in Slack.
[text](url)shows as raw text in Slack; only Slack-native<url|text>renders as a clickable link. Aaron documented the three forms tested in the issue body.The two bugs share the same outbound boundary (
SlackReplyClient.PostThreadReplyAsync/SlackOutboundClient.PostNewThreadAsync/SlackReplyClient.UpdateThreadMessageAsync) so I'm fixing them in one place.Fix
A new
SlackTextProtectorruns on the agent's response text before it's posted, plus the same helpers run insideSlackBlockConverterso the Block Kit and Text-field paths stay consistent.+, no%2B)<url>mrkdwn (clickable)[label](url)RichTextLinkRichTextText { Style.Code = true }NormaliseScopeList(url)runs first for both paths: if the URL has ascope=parameter whose value contains two or more%2Bseparators, decode them back to+. Single%2Bis left alone (likely a legitimate literal+).IsRewriteProne(url)is the central is-it-safe-to-link predicate (today: contains+or%2B; extensible).Tests
19 xUnit facts in
Netclaw.Actors.Tests/Channels/SlackTextProtectorTests.cs:+, mis-encoded%2B, single-%2Bnegative case,%20negative case+<...>,`...`,<url|label>mrkdwn(see https://...)IsRewriteProneandNormaliseScopeListparameterised theoriesTDD evidence
Verified failing-then-passing locally — with
NormaliseScopeListandIsRewriteProneremoved (the rest of the test cases kept as-is), the rewrite-prone facts fail with the exact diff the bug shows. With the helpers in place, all 50 facts inNetclaw.Actors.Tests/Channels/pass.End-to-end verification
Doist hosted MCP + Anthropic
claude-sonnet-4-5+taylorwilsdon/google_workspace_mcpover Slack. Before this PR: bot DM'd an OAuth URL via markdown link; Slack rendered it as raw[Authorization URL](url-with-%2B)text and click/paste both producedinvalid_scope. After this PR: rewrite-prone URL is decoded to original+form and rendered as inline code (gray, non-clickable); manual copy out of the code element delivers the URL byte-exact and Google accepts the full 22-scope list.Known limitation — why #850 is only partially closed
Rewrite-prone URLs are intentionally non-clickable in this PR. Slack's link redirector cannot be told to skip a particular URL, so the only way to preserve a URL containing
+or%2Bis to render it as inline code (Slack does not click-rewrite text inside backticks). #850's "must be clickable" requirement is met for safe URLs but is intentionally not met for rewrite-prone ones — that's a tradeoff between clickable UX and a URL the IdP can actually accept.A follow-up could introduce a local HTTP redirector (daemon-side) that hands Slack a short opaque URL which 302s to the real URL. That's the only path I see to clickable + preserved. Out of scope here.
Out-of-scope
IsRewriteProneis intentionally conservative — extend as further click-rewrite patterns are observed.