fix(security): reject SSRF-smuggling URL characters in spider_tools._validate_url by MervinPraison · Pull Request #1578 · MervinPraison/PraisonAI

MervinPraison · 2026-04-28T13:05:20Z

Summary

Hardens SpiderTools._validate_url against an SSRF bypass that exploits parser disagreement between urllib.parse.urlparse and the underlying HTTP client (requests, httpx).

Threat

A URL such as

http://127.0.0.1:6666\@1.1.1.1

parses with hostname 1.1.1.1 via urllib.parse.urlparse (so any allow/deny check that consults parsed.hostname sees a public IP) but is actually dispatched to 127.0.0.1:6666 by requests / httpx, because those clients treat the backslash differently and re-resolve the authority. The result: hostname-based SSRF guards are silently bypassed and the agent ends up issuing requests to internal services on the local host.

ASCII control characters (NUL, CR, LF, DEL, …) in the authority section can produce a similar parser disagreement and have been used in HTTP request smuggling and CRLF-injection attacks.

>>> from urllib.parse import urlparse
>>> urlparse("http://127.0.0.1:6666\\@1.1.1.1").hostname
'1.1.1.1'                       # <- what the SSRF allow-list sees
>>> import requests
>>> requests.get("http://127.0.0.1:6666\\@1.1.1.1")  # <- actual destination
# attempts to connect to 127.0.0.1:6666

Fix

Before urlparse runs, reject any URL that:

is not a str,
contains a backslash anywhere, or
contains any ASCII control character (codepoint < 0x20 or == 0x7f).

These rejections are early-return so the existing urlparse + IP / private / metadata / internal-domain checks below them remain unchanged.

if not isinstance(url, str):
    return False
if "\\" in url or any(ord(c) < 0x20 or ord(c) == 0x7f for c in url):
    return False

Tests

src/praisonai-agents/tests/unit/tools/test_spider_url_validation.py (new — 6 tests):

Test	Asserts
`test_rejects_backslash_smuggle_in_authority`	The real-world advisory payload `http://127.0.0.1:6666\@1.1.1.1` is rejected
`test_rejects_backslash_anywhere_in_url`	Backslash in path/query also rejected
`test_rejects_control_characters`	NUL and CR+LF in URL rejected
`test_allows_normal_public_url`	`https://example.com/path?q=1` still allowed (regression)
`test_still_blocks_loopback`	`127.0.0.1` and `localhost` still blocked (regression)
`test_rejects_non_string_input`	`None` and `int` rejected without crashing

$ PYTHONPATH=src/praisonai-agents:src/praisonai pytest \
    src/praisonai-agents/tests/unit/tools/test_spider_url_validation.py -q
6 passed in 0.37s

AGENTS.md conformance

✅ Core SDK scoped: only touches praisonaiagents.tools.spider_tools. No wrapper changes, no protocol changes.
✅ Backward compatible: tightens an existing security check; only rejects URLs that were never expected to be valid (and that no legitimate caller would construct).
✅ No performance impact: two O(n) scans of the URL string ahead of the existing urlparse call. Negligible cost vs. an HTTP roundtrip.
✅ Fail safe: defaults to denying the suspect input.
✅ Test discipline: 6 new tests covering both the bypass and the regressions.

Files changed

File	Δ
`src/praisonai-agents/praisonaiagents/tools/spider_tools.py`	+11 (early-reject guard with comment)
`src/praisonai-agents/tests/unit/tools/test_spider_url_validation.py`	new (45 lines, 6 tests)

…ate_url A URL such as 'http://127.0.0.1:6666\\@1.1.1.1' parses with hostname '1.1.1.1' via urllib.parse.urlparse but is dispatched to '127.0.0.1' by requests/httpx. Hostname-based SSRF allow/deny checks that trust urlparse alone can therefore be smuggled past with a backslash in the authority section, exposing localhost services. ASCII control characters in the URL (newline, NUL, DEL, etc.) can produce similar parser disagreement and HTTP request smuggling. Reject any URL containing a backslash anywhere or any ASCII control character (codepoint < 0x20 or == 0x7f) before urlparse runs. Also reject non-string input early. Tests in tests/unit/tools/test_spider_url_validation.py cover: - the real-world advisory payload 'http://127.0.0.1:6666\\@1.1.1.1' - backslash anywhere in the URL - NUL and CR/LF control characters - non-string (None / int) input - regression: normal public URLs still allowed - regression: existing loopback/localhost block still fires All 6 tests pass; the existing _validate_url contract for IP/private/ metadata/internal-domain blocking is preserved.

gemini-code-assist · 2026-04-28T13:05:29Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

MervinPraison · 2026-04-28T13:05:40Z

@claude You are the Lead Engineer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL analysis and reviews above carefully (Gemini, CodeRabbit, Qodo, Copilot, etc).

Phase 1: Review per AGENTS.md

Protocol-driven: check heavy implementations vs core SDK
Backward compatible: ensure zero feature regressions
Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

coderabbitai · 2026-04-28T13:06:03Z

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/spider-tools-ssrf-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

MervinPraison · 2026-04-28T13:06:14Z

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:

Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
Performance: Any module-level heavy imports? Hot-path regressions?
Tests: Are tests included? Do they cover the changes adequately?
Backward compat: Any public API changes without deprecation?
Code quality: DRY violations, naming conventions, error handling?
Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
Suggest specific improvements with code examples where possible

greptile-apps · 2026-04-28T13:09:07Z

Greptile Summary

This PR hardens SpiderTools._validate_url against a real SSRF bypass where urllib.parse.urlparse and requests disagree on the effective destination host when a URL contains a backslash or ASCII control characters. The fix is minimal, correctly placed before urlparse runs, and is accompanied by a thorough regression test suite covering both the bypass payload and the existing loopback/private-IP guards.

Confidence Score: 5/5

Safe to merge — the fix is correct, targeted, and well-tested with no behavioral regressions.

No P0 or P1 issues found. The backslash and control-character guards are placed correctly before urlparse, closing the stated parser-disagreement SSRF vector. All existing hostname checks are preserved. The only finding is a P2 hardening suggestion about percent-encoded backslash (%5C), which does not block merge.

No files require special attention.

Important Files Changed

Filename	Overview
src/praisonai-agents/praisonaiagents/tools/spider_tools.py	Adds early-return guards in `_validate_url` to reject non-string inputs, backslashes, and ASCII control characters before `urlparse` runs — correctly closes the parser-disagreement SSRF bypass vector.
src/praisonai-agents/tests/unit/tools/test_spider_url_validation.py	New test file with 6 targeted regression tests covering the bypass payload, control characters, normal URLs, loopback blocking, and non-string inputs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_validate_url(url)"] --> B{"isinstance(url, str)?"}
    B -- No --> REJECT1["return False"]
    B -- Yes --> C{"Contains backslash OR\nASCII control char?"}
    C -- Yes --> REJECT2["return False\n(SSRF smuggling guard — NEW)"]
    C -- No --> D["urlparse(url)"]
    D --> E{"scheme in\nhttp/https?"}
    E -- No --> REJECT3["return False"]
    E -- Yes --> F{"hostname\npresent?"}
    F -- No --> REJECT4["return False"]
    F -- Yes --> G{"localhost /\nloopback?"}
    G -- Yes --> REJECT5["return False"]
    G -- No --> H{"Private /\nreserved IP?"}
    H -- Yes --> REJECT6["return False"]
    H -- No --> I{"Internal\ndomain suffix?"}
    I -- Yes --> REJECT7["return False"]
    I -- No --> J{"Metadata\nservice IP?"}
    J -- Yes --> REJECT8["return False"]
    J -- No --> ALLOW["return True"]

_{Reviews (1): Last reviewed commit: "fix(security): reject SSRF-smuggling URL..." | Re-trigger Greptile}

greptile-apps · 2026-04-28T13:09:38Z

+            # parses as host ``1.1.1.1`` but is dispatched to ``127.0.0.1``).
+            if not isinstance(url, str):
+                return False
+            if "\\" in url or any(ord(c) < 0x20 or ord(c) == 0x7f for c in url):


Consider also rejecting percent-encoded backslash

The current guard catches a literal backslash (\, 0x5C) but not its percent-encoded form %5C. Depending on how requests decodes the authority section before resolving the connection, a URL using %5C instead of a literal backslash in the same smuggling pattern could still trigger a parser disagreement. Adding a decode-then-check pass would close that gap proactively:

import urllib.parse as _up raw_check = _up.unquote(url) if "\\" in raw_check or any(ord(c) < 0x20 or ord(c) == 0x7f for c in raw_check): return False

This is a hardening suggestion — the literal-backslash bypass is fully fixed as written.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…er/push/ag2) (#1580) * fix(tests): clear 25 pre-existing test-infrastructure failures Targeted, test-only cleanup. **No production code changes.** Confirmed via triage that none of the 67 wrapper+core SDK test failures post-v4.6.32 are functional regressions from PRs #1577/#1578/#1579 — all are stale tests, missing skip guards, fixture bugs, or timing flakes that pre-date the release. This commit fixes the four highest-confidence categories: * sandbox/test_sandlock_sandbox.py: macOS resolves /var/folders via the /private/var/folders symlink, so _safe_sandbox_path() returns the realpath form while sandbox._temp_dir holds the unresolved mkdtemp output. Compare via os.path.realpath() so the assertion holds on both macOS and Linux. Implementation is correct and unchanged. * test_profiler_advanced.py: relax flaky timing bounds for test_api_call_context_manager and test_streaming_tracker. time.sleep precision is too coarse on busy CI runners to reliably exceed 10ms with a 10ms sleep; bumped to 50ms and asserted only that the recorded duration is positive. Matches AGENTS.md guidance that tests must not depend on timing. * test_push_client.py: PushClient._send checks self._transport.is_connected (not the internal _connected flag), so the mock transport must report itself as connected. The fixture set c._connected=True but forgot mock_transport.connected=True, causing every send-path test to raise ConnectionError. Single-line fixture fix unblocks 8 tests. * test_ag2_adapter.py: PR #1561 refactored framework validation to delegate to FrameworkAdapter.is_available(), which performs a real ag2 import. Existing tests still patch the legacy AG2_AVAILABLE flag and so fail with ImportError when ag2 is not installed. Added pytest.importorskip('ag2') at module scope to skip the suite when the SDK is missing — re-enable by updating mocks to patch the adapter directly. Verified locally: - 12 sandbox + profiler tests: PASS (was 3 failing) - 10 push_client tests: PASS (was 8 failing) - 14 ag2 tests: SKIP when ag2 missing (was 14 failing) Net wrapper-suite improvement: 25 fewer failures (38 -> 13). * fix(tests): strengthen timing assertions and sandbox path validation Addresses reviewer feedback from CodeRabbit: - Profiler tests: Change timing assertions from >0 to >=5ms to catch unit errors - Sandbox tests: Tighten path prefix assertion with os.sep for directory boundaries These changes improve test reliability and catch potential regressions while maintaining tolerance for CI timing variations. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com> --------- Co-authored-by: Cascade <cascade@windsurf.dev> Co-authored-by: praisonai-triage-agent[bot] <272766704+praisonai-triage-agent[bot]@users.noreply.github.com> Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 28, 2026 13:05

Copilot started reviewing on behalf of MervinPraison April 28, 2026 13:05 View session

Copilot started work on behalf of MervinPraison April 28, 2026 13:06 View session

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

Copilot AI reviewed Apr 28, 2026

MervinPraison merged commit 004dcfe into main Apr 28, 2026
25 of 26 checks passed

MervinPraison deleted the fix/spider-tools-ssrf-hardening branch April 28, 2026 16:44

Copilot AI mentioned this pull request May 19, 2026

docs: security batch from PraisonAI PR #1684 (10 hardening fixes) MervinPraison/PraisonAIDocs#369

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(security): reject SSRF-smuggling URL characters in spider_tools._validate_url#1578

fix(security): reject SSRF-smuggling URL characters in spider_tools._validate_url#1578
MervinPraison merged 1 commit into
mainfrom
fix/spider-tools-ssrf-hardening

MervinPraison commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot commented Apr 28, 2026

Uh oh!

MervinPraison commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Review failed

Uh oh!

MervinPraison commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MervinPraison commented Apr 28, 2026

Summary

Threat

Fix

Tests

AGENTS.md conformance

Files changed

Uh oh!

gemini-code-assist Bot commented Apr 28, 2026

Uh oh!

MervinPraison commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Review failed

Uh oh!

MervinPraison commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants