Skip to content

feat(browser): proxy v2 — real fetch + lxml rewrite + cookie jar (PR 3/10)#302

Merged
jaylfc merged 9 commits into
masterfrom
feat/browser-pr-3-rewriter-cookie-jar-profile
May 3, 2026
Merged

feat(browser): proxy v2 — real fetch + lxml rewrite + cookie jar (PR 3/10)#302
jaylfc merged 9 commits into
masterfrom
feat/browser-pr-3-rewriter-cookie-jar-profile

Conversation

@jaylfc
Copy link
Copy Markdown
Owner

@jaylfc jaylfc commented May 3, 2026

Third of ten PRs implementing BrowserApp v2 per the design doc.

First user-visible deliverable of BrowserApp v2. Replaces PR 2's 501 stub with the real proxy fetch pipeline — a logged-in user can hit a URL via /api/desktop/browser/proxy and get back the rewritten HTML with their cookies attached and the copilot script injected.

Summary

  • profile.py — Profile CRUD + default Personal/Work bootstrap. ensure_default_profiles is genuinely idempotent (uses INSERT OR IGNORE); safe to call on every request.
  • cookie_jar.py — httpx adapter wrapping PR 1's BrowserCookieStore. Per-request load/persist pattern bridges sync httpx + async SQLCipher store. Multi-user isolation enforced. Strips cookielib's leading-dot domain convention so Set-Cookie: Domain=github.com (stored as .github.com) is retrievable on the next request via urlparse(...).hostname (no dot).
  • rewriter.py — lxml-based DOM rewriter. Handles href/src/action on every tag, srcset (preserving descriptors), inline style="background-image: url(...)", <style> tag contents, and <meta http-equiv="refresh" content="N;url=...">. Skips data: / javascript: / mailto: / tel: / blob: / about: / # schemes.
  • injector.py — Inserts <script src="/__taos/copilot.js"> and <meta name="taos-copilot-ws" content="..."> into <head>. Idempotent.
  • copilot.js — Stub asset. Full RPC bridge + DOM extraction + annotation rendering land in PR 6.
  • proxy.py — Full orchestrator: auth → profile bootstrap → SSRF → cookie jar load → httpx fetch (manual redirect walk with per-hop SSRF re-check) → rewrite + inject + CSP for HTML / pass-through for non-HTML → cookie persist → response. 15s total deadline enforced via asyncio.wait_for (per-hop budget 5s).
  • /__taos/copilot.js — Static FileResponse route serving the copilot script.
  • app.pyBrowserStore + BrowserCookieStore wired on app.state in lifespan, with placeholder cookie key. Migration hazard for PR 5+ documented inline (SQLCipher rejects DB if key changes — PR 5 must rekey or wipe).

What this does not land

  • Service Worker for fetch interception — PR 8. Without it, SPAs can browse but their internal XHR/fetch calls fail.
  • Real copilot.js implementation (RPC, annotations, drive ops) — PR 6
  • Frontend BrowserApp UI rebuild — PR 4
  • Per-user Argon2id-derived cookie key — PR 5+

Test Plan

  • pytest tests/routes/desktop_browser/test_profile.py -v — 6 tests
  • pytest tests/routes/desktop_browser/test_cookie_jar.py -v — 6 tests (incl. regression test for the leading-dot domain bug using real http.cookiejar.Cookie parsing)
  • pytest tests/routes/desktop_browser/test_rewriter.py -v — 16 cases (parametrized; covers href/src/srcset/style/meta-refresh/scheme-skipping)
  • pytest tests/routes/desktop_browser/test_injector.py -v — 5 tests
  • pytest tests/routes/desktop_browser/test_proxy_fetch.py -v — 7 integration tests (respx-mocked) covering HTML rewriting, copilot injection, CSP application, non-HTML pass-through, cookie persistence, redirect SSRF re-check, static asset serve
  • Full pytest tests/routes/desktop_browser/123 total, all green
  • Broader regression pytest tests/routes/ tests/test_secrets.py — 205 tests pass, no regressions outside scope

Spec

docs/superpowers/specs/2026-05-03-browser-app-v2-design.md §3.2 (data flow), §4.3 (proxy v2), §9 (security model).

Cumulative shipping arc:

After PR 1:  cookie-aware storage backend ready
After PR 2:  proxy endpoint exists with auth + SSRF gate
After PR 3:  full lxml rewriter + cookie-aware HTTP fetch (this PR)   (BACKEND READY)
After PR 4:  new compact chrome, multi-window, tab model
…
After PR 10: cross-device push notifications                          (V1 SHIPS)

Notes for reviewers

  • Cookie key is a placeholder ("0" * 64) — auth integration isn't there yet. Store is genuinely encrypted; only the key derivation is deferred. PR 5+ wires it to the login password. Migration hazard documented inline in app.py so PR 5+ author doesn't ship a silent data-loss bug.
  • Manual redirect walkinghttpx.AsyncClient(follow_redirects=False) + a hand loop with SSRF re-check on every Location header. Critical for security: a redirect to a private IP must be blocked even if the initial URL was public. Test test_redirect_target_revalidated_against_ssrf covers this.
  • Total fetch deadline (15s) wraps the entire redirect walk via asyncio.wait_for — a malicious chain can't hold the proxy for 6 × 15s = 90s.
  • JS-runtime URLs are NOT rewritten — the rewriter only touches HTML. SPAs that build URLs in JS won't work fully until PR 8's Service Worker lands. This is the documented "70% of sites read fine vs. 70% of sites are fully functional" tradeoff.
  • Critical bug fixed during final review: Set-Cookie: Domain=github.com previously stored as .github.com (leading dot, cookielib convention) but lookup used urlparse(...).hostname (no dot) → exact-equality miss → user appeared logged in but lost the session on the next request. Strip leading dot on persist (and defensively on lookup). New regression test uses real http.cookiejar.Cookie path.

Summary by CodeRabbit

  • New Features

    • Browser proxy endpoint now fully functional with request handling, cookie persistence, and redirect support
    • Automatic URL rewriting for proxied HTML content
    • Copilot script injection into HTML responses with WebSocket integration
    • Per-user browser profile management with default "Personal" and "Work" profiles
    • Strict Content Security Policy enforcement on HTML responses
    • Static copilot JavaScript endpoint
  • Tests

    • Added comprehensive coverage for browser proxy pipeline, cookie handling, profile management, and HTML rewriting
  • Dependencies

    • Added lxml library for HTML processing

jaylfc added 8 commits May 3, 2026 22:56
Whole-branch Opus review caught a Critical bug + 3 Important items:

- cookie_jar.py (Critical): persist_response_cookies stripped no
  leading dot from the cookie's domain, but real Set-Cookie with
  Domain=github.com stores as `.github.com` per cookielib. Lookup via
  urlparse hostname (`github.com`, no dot) missed the row. Sites with
  explicit Domain= cookies (most real logins) appeared to authenticate
  but immediately lost the session on the next request. Strip leading
  dot on persist + defensively on lookup. Regression test uses real
  http.cookiejar.Cookie path (the existing test bypassed the bug via
  Cookies().set() which produces non-leading-dot domains).
- store.py (Important): add_profile used bare INSERT, racing on the
  PRIMARY KEY constraint when ensure_default_profiles fired
  concurrently for a brand-new user. Switched to INSERT OR IGNORE so
  the bootstrap is genuinely idempotent at the storage layer.
- app.py (Important): expanded the placeholder-cookie-key comment with
  an explicit MIGRATION HAZARD note for PR 5+. SQLCipher rejects an
  existing DB if the key changes, so swapping in per-user Argon2 keys
  without a rekey or wipe path silently corrupts existing user cookies.
- proxy.py (Important): wrapped the redirect-walk in asyncio.wait_for
  so the documented 15s total-deadline-including-redirects is now
  actually enforced (was per-operation; 6 hops x 15s = 90s worst
  case). Per-hop timeout reduced to 5s.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

Warning

Rate limit exceeded

@jaylfc has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 36 minutes and 9 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c87de686-334e-466c-b132-030b34c711d4

📥 Commits

Reviewing files that changed from the base of the PR and between 663d94a and 5be2659.

📒 Files selected for processing (7)
  • tests/routes/desktop_browser/test_cookie_jar.py
  • tests/routes/desktop_browser/test_cookie_store.py
  • tests/routes/desktop_browser/test_proxy_fetch.py
  • tinyagentos/app.py
  • tinyagentos/routes/desktop_browser/cookie_jar.py
  • tinyagentos/routes/desktop_browser/proxy.py
  • tinyagentos/routes/desktop_browser/store.py
📝 Walkthrough

Walkthrough

This PR adds the lxml dependency and implements a complete browser proxy pipeline for the desktop application. It includes persistent browser storage initialization, cookie-jar management, HTML URL rewriting/injection, a copilot script asset, and comprehensive test coverage for all new functionality.

Changes

Browser Proxy Feature Implementation

Layer / File(s) Summary
Dependencies & Storage Setup
pyproject.toml, tinyagentos/app.py, tests/conftest.py
Added lxml>=5.0.0 dependency. App lifespan now initializes BrowserStore and BrowserCookieStore with proper teardown. Test client fixture depends on tmp_data_dir and instantiates browser stores for test isolation.
Core Proxy Components
tinyagentos/routes/desktop_browser/cookie_jar.py, tinyagentos/routes/desktop_browser/rewriter.py, tinyagentos/routes/desktop_browser/injector.py
Cookie-jar adapter bridges httpx.Cookies with async BrowserCookieStore for per-request load/persist. HTML rewriter parses DOM and rewrites URL attributes, srcset, CSS url(...), and meta http-equiv="refresh". Injector adds copilot script and WebSocket meta to HTML head.
Proxy Route & Assets
tinyagentos/routes/desktop_browser/proxy.py, tinyagentos/routes/desktop_browser/copilot.js, tinyagentos/routes/desktop_browser/profile.py, tinyagentos/routes/desktop_browser/store.py
Proxy route replaces 501 stub with full pipeline: profile bootstrap, SSRF validation, cookie loading, upstream fetch with manual redirect walking, cookie persistence, HTML rewrite/injection with strict CSP, and header stripping. Static endpoint serves copilot.js stub. Profile module ensures default profiles and validates existence. Store add_profile made idempotent with INSERT OR IGNORE.
Test Coverage
tests/routes/desktop_browser/test_cookie_jar.py, tests/routes/desktop_browser/test_injector.py, tests/routes/desktop_browser/test_profile.py, tests/routes/desktop_browser/test_proxy_fetch.py, tests/routes/desktop_browser/test_proxy_shell.py, tests/routes/desktop_browser/test_rewriter.py, tests/routes/desktop_browser/test_store_tenancy.py
Comprehensive async pytest coverage for cookie-jar load/persist, HTML injection idempotency, profile CRUD, full proxy pipeline (HTML/non-HTML/cookies/redirects), SSRF gating, static assets, HTML URL rewriting (attributes, srcset, CSS, meta-refresh), and per-user store isolation. Updated store tenancy test from IntegrityError expectation to INSERT OR IGNORE behavior.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as Proxy Route<br/>(proxy.py)
    participant Jar as Cookie Jar<br/>(cookie_jar.py)
    participant Store as Browser Store<br/>(store.py)
    participant Upstream as Upstream Server
    participant Rewriter as HTML Rewriter<br/>(rewriter.py)
    participant Injector as Head Injector<br/>(injector.py)

    Client->>Proxy: GET /api/desktop/browser/proxy?url=X&profile_id=Y
    Proxy->>Store: ensure_default_profiles(user_id)
    Proxy->>Store: get_profile_or_404(user_id, profile_id)
    Proxy->>Proxy: SSRF validation on URL
    Proxy->>Jar: load_jar_for_request(user_id, profile_id, host)
    Jar->>Store: list_cookies(user_id, profile_id, host)
    Store-->>Jar: cookies[]
    Jar-->>Proxy: httpx.Cookies jar
    Proxy->>Upstream: fetch(url, cookies=jar)
    Upstream-->>Proxy: response (HTML/binary)
    Proxy->>Jar: persist_response_cookies(response)
    Jar->>Store: set_cookie(user_id, profile_id, cookie)
    Store-->>Jar: ✓
    alt response is HTML
        Proxy->>Rewriter: rewrite_html(html, base_url, proxy_fn)
        Rewriter-->>Proxy: rewritten_html
        Proxy->>Injector: inject_into_head(html, ws_url)
        Injector-->>Proxy: injected_html
        Proxy->>Proxy: Apply strict CSP header
    end
    Proxy->>Proxy: Strip set-cookie & CSP headers
    Proxy-->>Client: response (HTML/binary, no cookies)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 A proxy danced through the WebSocket mist,
With cookies tucked in a SQLite twist,
HTML rewrites and scripts injected with care,
Upstream responses returned fair and square!
lxml parsed, profiles took shape—
The browser's journey found its escape!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing the second generation of the browser proxy with real fetching, lxml-based HTML rewriting, and cookie jar functionality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/browser-pr-3-rewriter-cookie-jar-profile

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 36 minutes and 9 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
tests/routes/desktop_browser/test_proxy_fetch.py (1)

117-142: ⚡ Quick win

This doesn't verify persistence yet.

The test proves header stripping, but not that Set-Cookie was actually written to the server-side jar. A broken persist_response_cookies() path would still pass. Follow this with a second proxied request and assert the stored cookie is replayed upstream, or inspect app.state.browser_cookie_store directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routes/desktop_browser/test_proxy_fetch.py` around lines 117 - 142, The
test only checks header stripping but not that persist_response_cookies actually
stored cookies; after the first proxied GET to "/api/desktop/browser/proxy" (in
test_persists_set_cookie_to_jar) either make a second proxied request to the
same upstream URL and assert the upstream mock receives a "Cookie:
session=abc123" header, or directly inspect app.state.browser_cookie_store (or
the server-side cookie jar object used by persist_response_cookies) to assert it
contains the "session=abc123" entry; update the test to call the proxy twice (or
examine app.state.browser_cookie_store) and add the appropriate assertion so a
broken persist_response_cookies path will fail.
tests/routes/desktop_browser/test_proxy_shell.py (1)

107-122: ⚡ Quick win

Make this assertion deterministic.

assert resp.status_code != 501 still passes on unrelated regressions like a profile/bootstrap failure or a route error. Mock the upstream fetch and assert a concrete proxy outcome here, so this test only passes when the request actually reaches the fetch pipeline.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routes/desktop_browser/test_proxy_shell.py` around lines 107 - 122, The
test currently only patches DNS and asserts resp.status_code != 501 which is
nondeterministic; instead mock the upstream fetch call used by the
desktop_browser proxy handler so the route actually reaches the fetch pipeline
and returns a predictable response. In test_valid_request_now_attempts_fetch
patch the real fetch function the route calls (e.g., the module-level fetch
helper or the HTTP client method used by tinyagentos.routes.desktop_browser —
for example patch tinyagentos.routes.desktop_browser.fetch_url or
httpx.AsyncClient.get if that’s used) to return a fixed successful Response
(status 200 and known body), keep the existing getaddrinfo patch, then assert
resp.status_code == 200 (and optionally assert the response body matches the
mocked body) so the test only passes when the proxy forwards to the fetch
pipeline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tinyagentos/app.py`:
- Around line 317-323: The code uses a hardcoded SQLCipher key
(_placeholder_cookie_key) when constructing BrowserCookieStore, which makes all
installs share the same encryption key; instead, generate or retrieve a unique
per-installation key and pass that to BrowserCookieStore: e.g., on first run
create a strong random key (use secrets.token_bytes/ token_hex or a KDF) and
persist it in a restricted-permission local file or the OS keyring, on
subsequent runs read that stored key, and fall back to an environment variable
only for recoverability; update the initialization flow around
BrowserCookieStore.init() and app.state.browser_cookie_store to use the
retrieved/generated key rather than the hardcoded _placeholder_cookie_key.

In `@tinyagentos/routes/desktop_browser/cookie_jar.py`:
- Line 80: The current line using "if cookie.expires" treats expires==0 as falsy
and stores None; change the check to "cookie.expires is not None" when computing
expires_at so zero-expiry is preserved, and add logic in the cookie persistence
path (where BrowserCookieStore.set_cookie is called) to detect when expires_at
is in the past and call a new BrowserCookieStore.delete_cookie(user_id,
profile_id, host, path, name) instead of set_cookie; also modify
BrowserCookieStore.get_cookies to filter out expired rows (e.g. add "AND
(expires_at IS NULL OR expires_at > current_unix_time)" semantics) so expired
cookies are not returned.
- Around line 42-44: The SQL match in BrowserCookieStore.get_cookies currently
uses an exact equality (host = ?) so a cookie stored for "github.com" won't
match requests to "gist.github.com"; change the WHERE clause to include a
subdomain match instead of only equality—e.g. WHERE (host = :stored_host OR
:request_host LIKE '%' || host) so a stored host like "github.com" will match
request_host "gist.github.com" but not vice versa; keep the existing
host.lstrip(".") handling in load_jar_for_request and ensure parameter ordering
matches the new predicate in get_cookies (update BrowserCookieStore.get_cookies
in store.py and any SQL-binding code accordingly), and add/adjust tests to cover
domain-scoped cookies matching subdomains.

In `@tinyagentos/routes/desktop_browser/proxy.py`:
- Around line 114-126: The code currently calls http.get(current_url) and
accesses response.content (e.g., variable _resp) which buffers the whole
upstream body; change to streaming requests using http.stream("GET",
current_url, cookies=jar) and inspect headers from the response context
(Content-Type) before consuming the body; for non-HTML responses forward the
stream to the client in chunks (using response.aiter_bytes()/iter_bytes())
without storing into memory, and for HTML responses read into a bounded
accumulator up to a defined MAX_HTML_BYTES (reject or error if exceeded) before
performing rewrite/injection; apply the same streaming-and-bounded-read pattern
to the other occurrences that use response.content (lines referenced around
192-194 and 211-217), and keep unique symbols in mind: httpx.AsyncClient,
http.stream, current_url, load_jar_for_request, _resp, and any downstream code
that expects the full body so it can work with the bounded HTML payload only.

---

Nitpick comments:
In `@tests/routes/desktop_browser/test_proxy_fetch.py`:
- Around line 117-142: The test only checks header stripping but not that
persist_response_cookies actually stored cookies; after the first proxied GET to
"/api/desktop/browser/proxy" (in test_persists_set_cookie_to_jar) either make a
second proxied request to the same upstream URL and assert the upstream mock
receives a "Cookie: session=abc123" header, or directly inspect
app.state.browser_cookie_store (or the server-side cookie jar object used by
persist_response_cookies) to assert it contains the "session=abc123" entry;
update the test to call the proxy twice (or examine
app.state.browser_cookie_store) and add the appropriate assertion so a broken
persist_response_cookies path will fail.

In `@tests/routes/desktop_browser/test_proxy_shell.py`:
- Around line 107-122: The test currently only patches DNS and asserts
resp.status_code != 501 which is nondeterministic; instead mock the upstream
fetch call used by the desktop_browser proxy handler so the route actually
reaches the fetch pipeline and returns a predictable response. In
test_valid_request_now_attempts_fetch patch the real fetch function the route
calls (e.g., the module-level fetch helper or the HTTP client method used by
tinyagentos.routes.desktop_browser — for example patch
tinyagentos.routes.desktop_browser.fetch_url or httpx.AsyncClient.get if that’s
used) to return a fixed successful Response (status 200 and known body), keep
the existing getaddrinfo patch, then assert resp.status_code == 200 (and
optionally assert the response body matches the mocked body) so the test only
passes when the proxy forwards to the fetch pipeline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 13054bc1-f2cd-4c72-b2b2-0bb8f4fb3bfe

📥 Commits

Reviewing files that changed from the base of the PR and between 74f44c8 and 663d94a.

📒 Files selected for processing (17)
  • pyproject.toml
  • tests/conftest.py
  • tests/routes/desktop_browser/test_cookie_jar.py
  • tests/routes/desktop_browser/test_injector.py
  • tests/routes/desktop_browser/test_profile.py
  • tests/routes/desktop_browser/test_proxy_fetch.py
  • tests/routes/desktop_browser/test_proxy_shell.py
  • tests/routes/desktop_browser/test_rewriter.py
  • tests/routes/desktop_browser/test_store_tenancy.py
  • tinyagentos/app.py
  • tinyagentos/routes/desktop_browser/cookie_jar.py
  • tinyagentos/routes/desktop_browser/copilot.js
  • tinyagentos/routes/desktop_browser/injector.py
  • tinyagentos/routes/desktop_browser/profile.py
  • tinyagentos/routes/desktop_browser/proxy.py
  • tinyagentos/routes/desktop_browser/rewriter.py
  • tinyagentos/routes/desktop_browser/store.py

Comment thread tinyagentos/app.py Outdated
Comment on lines +42 to +44
rows = await cookie_store.get_cookies(
user_id=user_id, profile_id=profile_id, host=host.lstrip("."),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Domain-scoped cookies won't be sent to subdomains — breaks auth for many real sites.

get_cookies uses an exact SQL WHERE host = ? match. A cookie stored for github.com (after stripping the leading dot from .github.com) will never be found when the proxy fetches gist.github.com or api.github.com. Per RFC 6265, Domain=github.com must apply to all subdomains, which is standard practice for auth cookies on virtually every major site.

The fix belongs in BrowserCookieStore.get_cookies (store.py), but since load_jar_for_request is the call site establishing this pattern, tracking the gap here:

🐛 Proposed fix in `BrowserCookieStore.get_cookies`
- "WHERE user_id = ? AND profile_id = ? AND host = ?",
- (user_id, profile_id, host),
+ "WHERE user_id = ? AND profile_id = ? AND (host = ? OR ? LIKE '%.' || host)",
+ (user_id, profile_id, host, host),

'gist.github.com' LIKE '%.github.com' → True, so a stored github.com cookie is correctly included in requests to any subdomain. The reverse is not matched (a gist.github.com cookie is never sent to github.com), which is correct per spec.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/routes/desktop_browser/cookie_jar.py` around lines 42 - 44, The
SQL match in BrowserCookieStore.get_cookies currently uses an exact equality
(host = ?) so a cookie stored for "github.com" won't match requests to
"gist.github.com"; change the WHERE clause to include a subdomain match instead
of only equality—e.g. WHERE (host = :stored_host OR :request_host LIKE '%' ||
host) so a stored host like "github.com" will match request_host
"gist.github.com" but not vice versa; keep the existing host.lstrip(".")
handling in load_jar_for_request and ensure parameter ordering matches the new
predicate in get_cookies (update BrowserCookieStore.get_cookies in store.py and
any SQL-binding code accordingly), and add/adjust tests to cover domain-scoped
cookies matching subdomains.

Comment thread tinyagentos/routes/desktop_browser/cookie_jar.py Outdated
Comment on lines +114 to +126
async with httpx.AsyncClient(
follow_redirects=False, timeout=_HOP_TIMEOUT,
) as http:
for hop in range(_MAX_REDIRECTS + 1):
host = urlparse(current_url).hostname or ""

jar = await load_jar_for_request(
cookie_store, user_id=user_id, profile_id=profile_id, host=host,
)

try:
_resp = await http.get(current_url, cookies=jar)
except httpx.HTTPError as e:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid buffering arbitrary upstream bodies in memory.

http.get() reads the full upstream response before we know whether it's HTML, and both branches then use response.content. For a user-supplied proxy URL, a large download or media asset becomes an unbounded memory spike on the server. Non-HTML responses should be streamed, and HTML responses should have a size cap before rewrite/injection.

Also applies to: 192-194, 211-217

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 117-117: Loop control variable hop not used within loop body

Rename unused hop to _hop

(B007)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/routes/desktop_browser/proxy.py` around lines 114 - 126, The code
currently calls http.get(current_url) and accesses response.content (e.g.,
variable _resp) which buffers the whole upstream body; change to streaming
requests using http.stream("GET", current_url, cookies=jar) and inspect headers
from the response context (Content-Type) before consuming the body; for non-HTML
responses forward the stream to the client in chunks (using
response.aiter_bytes()/iter_bytes()) without storing into memory, and for HTML
responses read into a bounded accumulator up to a defined MAX_HTML_BYTES (reject
or error if exceeded) before performing rewrite/injection; apply the same
streaming-and-bounded-read pattern to the other occurrences that use
response.content (lines referenced around 192-194 and 211-217), and keep unique
symbols in mind: httpx.AsyncClient, http.stream, current_url,
load_jar_for_request, _resp, and any downstream code that expects the full body
so it can work with the bounded HTML payload only.

Comment on lines +120 to +134
jar = await load_jar_for_request(
cookie_store, user_id=user_id, profile_id=profile_id, host=host,
)

try:
_resp = await http.get(current_url, cookies=jar)
except httpx.HTTPError as e:
_logger.info("browser proxy fetch error: err=%s", e)
return None

# Persist any cookies set by this hop
await persist_response_cookies(
cookie_store, _resp.cookies,
user_id=user_id, profile_id=profile_id,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Host-only cookie loading loses real browser cookie semantics.

This call path only loads cookies by exact host, but the current jar helpers persist .example.com as example.com and then look up by exact host. That means cookies scoped to a parent domain will not survive redirects across subdomains, and the loader has no scheme input to enforce secure-only replay rules on later http:// hops. Common login flows like www.example.com -> app.example.com will break, and scheme downgrades can replay cookies too broadly. Please move this integration to full cookie matching semantics before merge. Cross-file evidence: tinyagentos/routes/desktop_browser/cookie_jar.py:29-54 and tinyagentos/routes/desktop_browser/cookie_jar.py:57-82.

… cap

Five Major findings from CodeRabbit's PR 302 review:

- app.py: replace universal "0"*64 SQLCipher key with a per-install
  random key persisted to data_dir/browser_cookie_key.hex (mode 0o600)
  generated via secrets.token_hex(32). Override via env var
  TAOS_BROWSER_COOKIE_KEY_HEX for recovery. Removes the at-rest
  protection regression where every install shared the same key.
- store.py: get_cookies now matches subdomains per RFC 6265
  ((host = ? OR ? LIKE '%.' || host)), so Set-Cookie: Domain=github.com
  reaches gist.github.com / api.github.com etc. — fixes auth on every
  major site. Also adds expiry filter so expired rows aren't returned.
- store.py: new delete_cookie method for explicit cookie removal.
- cookie_jar.py: `if cookie.expires` → `cookie.expires is not None`
  so server-sent expires=0 (cookie deletion) is preserved instead of
  being silently treated as a session cookie. When persisting a
  past-dated cookie, route to delete_cookie instead of set_cookie so
  logout actually invalidates the session.
- proxy.py: 10 MB response size cap before HTML rewrite / non-HTML
  pass-through. Without it, a user-supplied URL pointing at a large
  download would OOM the proxy.

+6 regression tests covering subdomain cascade (both directions),
expired-cookie filtering, explicit delete_cookie, server-driven
cookie deletion via past expiry, and the response size cap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant