Skip to content

fix(slack): normalise outbound URLs at boundary — closes #1092, partial #850#1102

Merged
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
johnkattenhorn:fix/slack-url-rewrite-decode
May 19, 2026
Merged

fix(slack): normalise outbound URLs at boundary — closes #1092, partial #850#1102
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
johnkattenhorn:fix/slack-url-rewrite-decode

Conversation

@johnkattenhorn
Copy link
Copy Markdown
Contributor

Addresses #1092 in full and partially closes #850.

Two bugs, one outbound-Slack boundary

#1092 — Slack rewrites OAuth scope delimiters on click.
The LLM's response sometimes carries the URL inside a markdown link [label](url) and URL-encodes the literal + (scope-list separator) into %2B along the way. Whether the URL reaches Slack as plain text, mrkdwn <url>, or Block Kit RichTextLink, Slack's click redirector re-encodes + to %2B on click, so the final URL the IdP sees has every scope concatenated into one invalid string. Repro: Doist + taylorwilsdon/google_workspace_mcp + claude-sonnet-4-5 → click → Error 400: invalid_scope.

#850 — Standard markdown links don't render in Slack.
[text](url) shows as raw text in Slack; only Slack-native <url|text> renders as a clickable link. Aaron documented the three forms tested in the issue body.

The two bugs share the same outbound boundary (SlackReplyClient.PostThreadReplyAsync / SlackOutboundClient.PostNewThreadAsync / SlackReplyClient.UpdateThreadMessageAsync) so I'm fixing them in one place.

Fix

A new SlackTextProtector runs on the agent's response text before it's posted, plus the same helpers run inside SlackBlockConverter so the Block Kit and Text-field paths stay consistent.

Path Safe URL (no +, no %2B) Rewrite-prone URL
Bare URL <url> mrkdwn (clickable) inline code (non-clickable, copy-paste)
Markdown link [label](url) `<url label>` mrkdwn (clickable — addresses #850)
Block Kit RichTextLink RichTextText { Style.Code = true }

NormaliseScopeList(url) runs first for both paths: if the URL has a scope= parameter whose value contains two or more %2B separators, decode them back to +. Single %2B is left alone (likely a legitimate literal +).

IsRewriteProne(url) is the central is-it-safe-to-link predicate (today: contains + or %2B; extensible).

Tests

19 xUnit facts in Netclaw.Actors.Tests/Channels/SlackTextProtectorTests.cs:

  • bare URL handling — safe, literal +, mis-encoded %2B, single-%2B negative case, %20 negative case
  • markdown link handling — safe, mis-encoded scope-list, literal +
  • already-protected URLs — <...>, `...`, <url|label> mrkdwn
  • multiplicity — multiple URLs, mixed markdown + bare
  • prose punctuation — (see https://...)
  • realistic 22-scope Google OAuth URL — full decode + render as inline code
  • direct invariants — IsRewriteProne and NormaliseScopeList parameterised theories

TDD evidence

Verified failing-then-passing locally — with NormaliseScopeList and IsRewriteProne removed (the rest of the test cases kept as-is), the rewrite-prone facts fail with the exact diff the bug shows. With the helpers in place, all 50 facts in Netclaw.Actors.Tests/Channels/ pass.

End-to-end verification

Doist hosted MCP + Anthropic claude-sonnet-4-5 + taylorwilsdon/google_workspace_mcp over Slack. Before this PR: bot DM'd an OAuth URL via markdown link; Slack rendered it as raw [Authorization URL](url-with-%2B) text and click/paste both produced invalid_scope. After this PR: rewrite-prone URL is decoded to original + form and rendered as inline code (gray, non-clickable); manual copy out of the code element delivers the URL byte-exact and Google accepts the full 22-scope list.

Known limitation — why #850 is only partially closed

Rewrite-prone URLs are intentionally non-clickable in this PR. Slack's link redirector cannot be told to skip a particular URL, so the only way to preserve a URL containing + or %2B is to render it as inline code (Slack does not click-rewrite text inside backticks). #850's "must be clickable" requirement is met for safe URLs but is intentionally not met for rewrite-prone ones — that's a tradeoff between clickable UX and a URL the IdP can actually accept.

A follow-up could introduce a local HTTP redirector (daemon-side) that hands Slack a short opaque URL which 302s to the real URL. That's the only path I see to clickable + preserved. Out of scope here.

Out-of-scope

  • Discord channel adapter unchanged.
  • IsRewriteProne is intentionally conservative — extend as further click-rewrite patterns are observed.
  • Broader mrkdwn translation of the Text field (bold/italic/code outside URL handling) can be a follow-up.

… partial netclaw-dev#850

netclaw-dev#1092 — Slack click redirector and the LLM's URL-encoding of '+' to
'%2B' inside markdown links combine to corrupt OAuth scope lists.
netclaw-dev#850 — Standard markdown links '[text](url)' don't render as links in
Slack at all; only Slack-native '<url|text>' does.

This commit introduces SlackTextProtector and threads it through the
two outbound Slack clients. It also updates SlackBlockConverter so the
Block Kit and plain-text fallback paths stay in lockstep.

Pieces of the fix:

  * NormaliseScopeList(url) — when a URL has a 'scope=' query parameter
    whose value contains two or more '%2B' separators, decode them
    back to '+'. This is the LLM-introduced corruption pattern (Claude
    re-encoding the literal '+' delimiter when wrapping the URL in a
    markdown link); restoring '+' is what the IdP expects. Conservative
    — a single '%2B' is left alone (likely legitimate).

  * IsRewriteProne(url) — flag URLs containing literal '+' or '%2B'.
    Slack rewrites these on click via its link redirector, regardless
    of whether they reach Slack as plain text, mrkdwn '<url>' or a
    Block Kit RichTextLink. The only reliable way to preserve such a
    URL is to render it non-clickable, so the user copies it exact.

  * ProtectUrls(text) — applied to the Text field of every outbound
    chat.postMessage. Converts markdown '[label](url)' into Slack-
    native '<url|label>' for safe URLs (addresses netclaw-dev#850's clickable-
    link requirement). Rewrite-prone URLs are normalised (above) and
    rendered as inline code; the label is dropped because the URL has
    to be the visible payload.

  * SlackBlockConverter calls the same predicate and normaliser so
    that Block Kit RichTextLink is emitted for safe URLs and inline-
    code RichTextText for rewrite-prone URLs. The two paths agree.

Tests (Netclaw.Actors.Tests/Channels/SlackTextProtectorTests.cs):

  * 19 xUnit facts covering bare URLs, markdown links, the OAuth-scope
    bug-of-record (both '+' and '%2B' shapes), single '%2B' negative-
    case, '%20' negative-case, prose parentheses, existing-wrap pass-
    through (both '<...>' and '`...`'), http:// callbacks, the
    realistic 22-scope Google URL, plus parameterised theories for
    IsRewriteProne and NormaliseScopeList.

  * Verified failing-then-passing locally: with NormaliseScopeList and
    IsRewriteProne removed, the rewrite-prone cases fail with the
    exact diff the bug shows; with the helpers in place, all 50 facts
    in Netclaw.Actors.Tests/Channels/ pass.

End-to-end verification:

  * Doist + Anthropic claude-sonnet-4-5 + taylorwilsdon/google_workspace_mcp.
    Before this commit: bot DM'd a Google OAuth URL via markdown link
    that Slack rendered as raw '[Authorization URL](url-with-%2B)'
    text; copy or click both produced 'Error 400: invalid_scope' from
    Google.
  * After this commit: rewrite-prone URL is decoded to the original
    '+' form and rendered as inline code (gray, non-clickable);
    manual copy out of the code element delivers the URL byte-exact
    to the browser and Google accepts the full 22-scope list.

Known limitation (intentional):

  * Rewrite-prone URLs are non-clickable. This is a tradeoff —
    Slack's link redirector cannot be told to skip a particular URL,
    so the only way to preserve a URL containing '+' / '%2B' is to
    render it as inline code. netclaw-dev#850's clickable-link requirement is
    addressed for safe URLs but is intentionally not met for rewrite-
    prone ones. A follow-up could introduce a local HTTP redirector
    that hands Slack a short opaque URL which 302s to the real URL —
    that's the only path I can see to clickable + preserved.
Addresses maintainer review of netclaw-dev#1102:

- BareUrlRegex no longer swallows trailing sentence punctuation
  (.,!?;:). On the new plain-text Text-field path, "see
  https://x.com." would otherwise wrap the period inside the
  clickable link target. Only the final character is constrained,
  so punctuation inside the URL path is still preserved.
- Consolidate the two divergent BareUrlRegex definitions.
  SlackBlockConverter now consumes the shared (internal)
  SlackTextProtector.BareUrlRegex so the Block Kit and plain-text
  surfaces tokenize URLs identically, as the PR intends.
- NormaliseScopeList anchors 'scope=' to a query-parameter
  boundary (? or &) so a 'scope=' substring inside a path segment
  or a longer parameter name (e.g. 'myscope=') cannot be
  mis-targeted by the decode.

Adds regression tests on both the Text and Block Kit surfaces.
@Aaronontheweb
Copy link
Copy Markdown
Collaborator

Thanks for this @johnkattenhorn — solid fix, and the writeup made the two failure modes easy to follow. I gave it a full review and pushed a follow-up commit (62c0c03) directly to the branch to clear the should-fix items rather than ping-pong on them:

1. Bare-URL regex swallowed trailing sentence punctuation. BareUrlRegex matched https://example.com. including the period. This was already latent in SlackBlockConverter, but the PR newly applies wrapping to the plain-text Text field — so it would emit <https://example.com.> and Slack makes the period part of the clickable link target. The regex now constrains only the final character to exclude .,!?;:; mid-URL punctuation is still preserved.

2. Two divergent BareUrlRegex definitions. SlackTextProtector and SlackBlockConverter each had their own bare-URL pattern with different exclusion sets, which undercut the "both surfaces stay in lockstep" goal of the PR. SlackBlockConverter now consumes the shared internal SlackTextProtector.BareUrlRegex() — one source of truth.

3. NormaliseScopeList scope= targeting. IndexOf("scope=") would also match a scope= substring inside a path segment or a longer parameter name (myscope=). It now anchors to a ?/& query-parameter boundary. Low real-world risk thanks to the >=2-%2B guard, but worth tightening since it is new code.

Regression tests added on both the Text and Block Kit surfaces. Locally: 63/63 channel tests pass, dotnet slopwatch analyze is clean, and copyright headers verify.

Two pre-existing nits I deliberately left alone — not introduced here, inherited from SlackBlockConverter, fine to track separately:

  • IsRewriteProne flags any + anywhere in a URL, so a legitimately-clickable URL containing a + becomes non-clickable inline code. Over-protecting (copy-paste instead of click) is the safe failure mode, so this is acceptable as-is.
  • The markdown-link regex \(([^)]+)\) truncates URLs containing ) (e.g. Wikipedia-style ..._(bar)).

The partial #850 close is well-reasoned and clearly documented — no objection to deferring the rest. LGTM once CI is green.

@Aaronontheweb Aaronontheweb added channels Discord, Slack, and other channels. reliability Retries, resilience, graceful degradation labels May 19, 2026
Copy link
Copy Markdown
Collaborator

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) May 19, 2026 17:43
@Aaronontheweb Aaronontheweb disabled auto-merge May 19, 2026 17:58
@Aaronontheweb Aaronontheweb merged commit 756aa4e into netclaw-dev:dev May 19, 2026
14 checks passed
Aaronontheweb added a commit that referenced this pull request May 19, 2026
Closes #1107. Follow-up to #1102.

The markdown-link regex \[([^\]]+)\]\(([^)]+)\) truncated any link
destination containing a ')'. A URL like
https://en.wikipedia.org/wiki/Foo_(disambiguation) was cut at the
first ')', and the remainder leaked into the message as stray text.

The url group now accepts balanced parenthesised segments —
(?:[^()]|\([^()]*\))* — so a ')' only closes the markdown link when
it is not part of a balanced pair (CommonMark link-destination
semantics). One level of paren nesting is supported, which covers
every URL shape seen in practice.

Also consolidates the two duplicated markdown-link regexes:
SlackBlockConverter now consumes the shared (internal)
SlackTextProtector.MarkdownLinkRegex, mirroring the bare-URL regex
consolidation from #1102.

Regression tests added on both the Text and Block Kit surfaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channels Discord, Slack, and other channels. reliability Retries, resilience, graceful degradation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slack: Markdown links render as raw text - must use Slack native links

2 participants