fix(security): reject SSRF-smuggling URL characters in spider_tools._validate_url#1578
Conversation
…ate_url A URL such as 'http://127.0.0.1:6666\\@1.1.1.1' parses with hostname '1.1.1.1' via urllib.parse.urlparse but is dispatched to '127.0.0.1' by requests/httpx. Hostname-based SSRF allow/deny checks that trust urlparse alone can therefore be smuggled past with a backslash in the authority section, exposing localhost services. ASCII control characters in the URL (newline, NUL, DEL, etc.) can produce similar parser disagreement and HTTP request smuggling. Reject any URL containing a backslash anywhere or any ASCII control character (codepoint < 0x20 or == 0x7f) before urlparse runs. Also reject non-string input early. Tests in tests/unit/tools/test_spider_url_validation.py cover: - the real-world advisory payload 'http://127.0.0.1:6666\\@1.1.1.1' - backslash anywhere in the URL - NUL and CR/LF control characters - non-string (None / int) input - regression: normal public URLs still allowed - regression: existing loopback/localhost block still fires All 6 tests pass; the existing _validate_url contract for IP/private/ metadata/internal-domain blocking is preserved.
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
@claude You are the Lead Engineer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL analysis and reviews above carefully (Gemini, CodeRabbit, Qodo, Copilot, etc). Phase 1: Review per AGENTS.md
Phase 2: FIX Valid Issues Phase 3: Final Verdict |
|
Caution Review failedAn error occurred during the review process. Please try again later. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings. Review areas:
|
Greptile SummaryThis PR hardens Confidence Score: 5/5Safe to merge — the fix is correct, targeted, and well-tested with no behavioral regressions. No P0 or P1 issues found. The backslash and control-character guards are placed correctly before No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_validate_url(url)"] --> B{"isinstance(url, str)?"}
B -- No --> REJECT1["return False"]
B -- Yes --> C{"Contains backslash OR\nASCII control char?"}
C -- Yes --> REJECT2["return False\n(SSRF smuggling guard — NEW)"]
C -- No --> D["urlparse(url)"]
D --> E{"scheme in\nhttp/https?"}
E -- No --> REJECT3["return False"]
E -- Yes --> F{"hostname\npresent?"}
F -- No --> REJECT4["return False"]
F -- Yes --> G{"localhost /\nloopback?"}
G -- Yes --> REJECT5["return False"]
G -- No --> H{"Private /\nreserved IP?"}
H -- Yes --> REJECT6["return False"]
H -- No --> I{"Internal\ndomain suffix?"}
I -- Yes --> REJECT7["return False"]
I -- No --> J{"Metadata\nservice IP?"}
J -- Yes --> REJECT8["return False"]
J -- No --> ALLOW["return True"]
Reviews (1): Last reviewed commit: "fix(security): reject SSRF-smuggling URL..." | Re-trigger Greptile |
| # parses as host ``1.1.1.1`` but is dispatched to ``127.0.0.1``). | ||
| if not isinstance(url, str): | ||
| return False | ||
| if "\\" in url or any(ord(c) < 0x20 or ord(c) == 0x7f for c in url): |
There was a problem hiding this comment.
Consider also rejecting percent-encoded backslash
The current guard catches a literal backslash (\, 0x5C) but not its percent-encoded form %5C. Depending on how requests decodes the authority section before resolving the connection, a URL using %5C instead of a literal backslash in the same smuggling pattern could still trigger a parser disagreement. Adding a decode-then-check pass would close that gap proactively:
import urllib.parse as _up
raw_check = _up.unquote(url)
if "\\" in raw_check or any(ord(c) < 0x20 or ord(c) == 0x7f for c in raw_check):
return FalseThis is a hardening suggestion — the literal-backslash bypass is fully fixed as written.
…er/push/ag2) (#1580) * fix(tests): clear 25 pre-existing test-infrastructure failures Targeted, test-only cleanup. **No production code changes.** Confirmed via triage that none of the 67 wrapper+core SDK test failures post-v4.6.32 are functional regressions from PRs #1577/#1578/#1579 — all are stale tests, missing skip guards, fixture bugs, or timing flakes that pre-date the release. This commit fixes the four highest-confidence categories: * sandbox/test_sandlock_sandbox.py: macOS resolves /var/folders via the /private/var/folders symlink, so _safe_sandbox_path() returns the realpath form while sandbox._temp_dir holds the unresolved mkdtemp output. Compare via os.path.realpath() so the assertion holds on both macOS and Linux. Implementation is correct and unchanged. * test_profiler_advanced.py: relax flaky timing bounds for test_api_call_context_manager and test_streaming_tracker. time.sleep precision is too coarse on busy CI runners to reliably exceed 10ms with a 10ms sleep; bumped to 50ms and asserted only that the recorded duration is positive. Matches AGENTS.md guidance that tests must not depend on timing. * test_push_client.py: PushClient._send checks self._transport.is_connected (not the internal _connected flag), so the mock transport must report itself as connected. The fixture set c._connected=True but forgot mock_transport.connected=True, causing every send-path test to raise ConnectionError. Single-line fixture fix unblocks 8 tests. * test_ag2_adapter.py: PR #1561 refactored framework validation to delegate to FrameworkAdapter.is_available(), which performs a real ag2 import. Existing tests still patch the legacy AG2_AVAILABLE flag and so fail with ImportError when ag2 is not installed. Added pytest.importorskip('ag2') at module scope to skip the suite when the SDK is missing — re-enable by updating mocks to patch the adapter directly. Verified locally: - 12 sandbox + profiler tests: PASS (was 3 failing) - 10 push_client tests: PASS (was 8 failing) - 14 ag2 tests: SKIP when ag2 missing (was 14 failing) Net wrapper-suite improvement: 25 fewer failures (38 -> 13). * fix(tests): strengthen timing assertions and sandbox path validation Addresses reviewer feedback from CodeRabbit: - Profiler tests: Change timing assertions from >0 to >=5ms to catch unit errors - Sandbox tests: Tighten path prefix assertion with os.sep for directory boundaries These changes improve test reliability and catch potential regressions while maintaining tolerance for CI timing variations. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com> --------- Co-authored-by: Cascade <cascade@windsurf.dev> Co-authored-by: praisonai-triage-agent[bot] <272766704+praisonai-triage-agent[bot]@users.noreply.github.com> Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
Summary
Hardens
SpiderTools._validate_urlagainst an SSRF bypass that exploits parser disagreement betweenurllib.parse.urlparseand the underlying HTTP client (requests,httpx).Threat
A URL such as
parses with hostname
1.1.1.1viaurllib.parse.urlparse(so any allow/deny check that consultsparsed.hostnamesees a public IP) but is actually dispatched to127.0.0.1:6666byrequests/httpx, because those clients treat the backslash differently and re-resolve the authority. The result: hostname-based SSRF guards are silently bypassed and the agent ends up issuing requests to internal services on the local host.ASCII control characters (NUL, CR, LF, DEL, …) in the authority section can produce a similar parser disagreement and have been used in HTTP request smuggling and CRLF-injection attacks.
Fix
Before
urlparseruns, reject any URL that:str,< 0x20or== 0x7f).These rejections are early-return so the existing
urlparse+ IP / private / metadata / internal-domain checks below them remain unchanged.Tests
src/praisonai-agents/tests/unit/tools/test_spider_url_validation.py(new — 6 tests):test_rejects_backslash_smuggle_in_authorityhttp://127.0.0.1:6666\@1.1.1.1is rejectedtest_rejects_backslash_anywhere_in_urltest_rejects_control_characterstest_allows_normal_public_urlhttps://example.com/path?q=1still allowed (regression)test_still_blocks_loopback127.0.0.1andlocalhoststill blocked (regression)test_rejects_non_string_inputNoneandintrejected without crashing$ PYTHONPATH=src/praisonai-agents:src/praisonai pytest \ src/praisonai-agents/tests/unit/tools/test_spider_url_validation.py -q 6 passed in 0.37sAGENTS.md conformance
praisonaiagents.tools.spider_tools. No wrapper changes, no protocol changes.O(n)scans of the URL string ahead of the existingurlparsecall. Negligible cost vs. an HTTP roundtrip.Files changed
src/praisonai-agents/praisonaiagents/tools/spider_tools.pysrc/praisonai-agents/tests/unit/tools/test_spider_url_validation.py