feat(browser): proxy v2 — real fetch + lxml rewrite + cookie jar (PR 3/10) by jaylfc · Pull Request #302 · jaylfc/tinyagentos

jaylfc · 2026-05-03T22:39:19Z

Third of ten PRs implementing BrowserApp v2 per the design doc.

First user-visible deliverable of BrowserApp v2. Replaces PR 2's 501 stub with the real proxy fetch pipeline — a logged-in user can hit a URL via /api/desktop/browser/proxy and get back the rewritten HTML with their cookies attached and the copilot script injected.

Summary

profile.py — Profile CRUD + default Personal/Work bootstrap. ensure_default_profiles is genuinely idempotent (uses INSERT OR IGNORE); safe to call on every request.
cookie_jar.py — httpx adapter wrapping PR 1's BrowserCookieStore. Per-request load/persist pattern bridges sync httpx + async SQLCipher store. Multi-user isolation enforced. Strips cookielib's leading-dot domain convention so Set-Cookie: Domain=github.com (stored as .github.com) is retrievable on the next request via urlparse(...).hostname (no dot).
rewriter.py — lxml-based DOM rewriter. Handles href/src/action on every tag, srcset (preserving descriptors), inline style="background-image: url(...)", <style> tag contents, and <meta http-equiv="refresh" content="N;url=...">. Skips data: / javascript: / mailto: / tel: / blob: / about: / # schemes.
injector.py — Inserts <script src="/__taos/copilot.js"> and <meta name="taos-copilot-ws" content="..."> into <head>. Idempotent.
copilot.js — Stub asset. Full RPC bridge + DOM extraction + annotation rendering land in PR 6.
proxy.py — Full orchestrator: auth → profile bootstrap → SSRF → cookie jar load → httpx fetch (manual redirect walk with per-hop SSRF re-check) → rewrite + inject + CSP for HTML / pass-through for non-HTML → cookie persist → response. 15s total deadline enforced via asyncio.wait_for (per-hop budget 5s).
/__taos/copilot.js — Static FileResponse route serving the copilot script.
app.py — BrowserStore + BrowserCookieStore wired on app.state in lifespan, with placeholder cookie key. Migration hazard for PR 5+ documented inline (SQLCipher rejects DB if key changes — PR 5 must rekey or wipe).

What this does not land

Service Worker for fetch interception — PR 8. Without it, SPAs can browse but their internal XHR/fetch calls fail.
Real copilot.js implementation (RPC, annotations, drive ops) — PR 6
Frontend BrowserApp UI rebuild — PR 4
Per-user Argon2id-derived cookie key — PR 5+

Test Plan

pytest tests/routes/desktop_browser/test_profile.py -v — 6 tests
pytest tests/routes/desktop_browser/test_cookie_jar.py -v — 6 tests (incl. regression test for the leading-dot domain bug using real http.cookiejar.Cookie parsing)
pytest tests/routes/desktop_browser/test_rewriter.py -v — 16 cases (parametrized; covers href/src/srcset/style/meta-refresh/scheme-skipping)
pytest tests/routes/desktop_browser/test_injector.py -v — 5 tests
pytest tests/routes/desktop_browser/test_proxy_fetch.py -v — 7 integration tests (respx-mocked) covering HTML rewriting, copilot injection, CSP application, non-HTML pass-through, cookie persistence, redirect SSRF re-check, static asset serve
Full pytest tests/routes/desktop_browser/ — 123 total, all green
Broader regression pytest tests/routes/ tests/test_secrets.py — 205 tests pass, no regressions outside scope

Spec

docs/superpowers/specs/2026-05-03-browser-app-v2-design.md §3.2 (data flow), §4.3 (proxy v2), §9 (security model).

Cumulative shipping arc:

After PR 1:  cookie-aware storage backend ready
After PR 2:  proxy endpoint exists with auth + SSRF gate
After PR 3:  full lxml rewriter + cookie-aware HTTP fetch (this PR)   (BACKEND READY)
After PR 4:  new compact chrome, multi-window, tab model
…
After PR 10: cross-device push notifications                          (V1 SHIPS)

Notes for reviewers

Cookie key is a placeholder ("0" * 64) — auth integration isn't there yet. Store is genuinely encrypted; only the key derivation is deferred. PR 5+ wires it to the login password. Migration hazard documented inline in app.py so PR 5+ author doesn't ship a silent data-loss bug.
Manual redirect walking — httpx.AsyncClient(follow_redirects=False) + a hand loop with SSRF re-check on every Location header. Critical for security: a redirect to a private IP must be blocked even if the initial URL was public. Test test_redirect_target_revalidated_against_ssrf covers this.
Total fetch deadline (15s) wraps the entire redirect walk via asyncio.wait_for — a malicious chain can't hold the proxy for 6 × 15s = 90s.
JS-runtime URLs are NOT rewritten — the rewriter only touches HTML. SPAs that build URLs in JS won't work fully until PR 8's Service Worker lands. This is the documented "70% of sites read fine vs. 70% of sites are fully functional" tradeoff.
Critical bug fixed during final review: Set-Cookie: Domain=github.com previously stored as .github.com (leading dot, cookielib convention) but lookup used urlparse(...).hostname (no dot) → exact-equality miss → user appeared logged in but lost the session on the next request. Strip leading dot on persist (and defensively on lookup). New regression test uses real http.cookiejar.Cookie path.

Summary by CodeRabbit

New Features
- Browser proxy endpoint now fully functional with request handling, cookie persistence, and redirect support
- Automatic URL rewriting for proxied HTML content
- Copilot script injection into HTML responses with WebSocket integration
- Per-user browser profile management with default "Personal" and "Work" profiles
- Strict Content Security Policy enforcement on HTML responses
- Static copilot JavaScript endpoint
Tests
- Added comprehensive coverage for browser proxy pipeline, cookie handling, profile management, and HTML rewriting
Dependencies
- Added lxml library for HTML processing

…kie jar

Whole-branch Opus review caught a Critical bug + 3 Important items: - cookie_jar.py (Critical): persist_response_cookies stripped no leading dot from the cookie's domain, but real Set-Cookie with Domain=github.com stores as `.github.com` per cookielib. Lookup via urlparse hostname (`github.com`, no dot) missed the row. Sites with explicit Domain= cookies (most real logins) appeared to authenticate but immediately lost the session on the next request. Strip leading dot on persist + defensively on lookup. Regression test uses real http.cookiejar.Cookie path (the existing test bypassed the bug via Cookies().set() which produces non-leading-dot domains). - store.py (Important): add_profile used bare INSERT, racing on the PRIMARY KEY constraint when ensure_default_profiles fired concurrently for a brand-new user. Switched to INSERT OR IGNORE so the bootstrap is genuinely idempotent at the storage layer. - app.py (Important): expanded the placeholder-cookie-key comment with an explicit MIGRATION HAZARD note for PR 5+. SQLCipher rejects an existing DB if the key changes, so swapping in per-user Argon2 keys without a rekey or wipe path silently corrupts existing user cookies. - proxy.py (Important): wrapped the redirect-walk in asyncio.wait_for so the documented 15s total-deadline-including-redirects is now actually enforced (was per-operation; 6 hops x 15s = 90s worst case). Per-hop timeout reduced to 5s.

coderabbitai · 2026-05-03T22:39:34Z

Warning

Rate limit exceeded

@jaylfc has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 36 minutes and 9 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c87de686-334e-466c-b132-030b34c711d4

📥 Commits

Reviewing files that changed from the base of the PR and between 663d94a and 5be2659.

📒 Files selected for processing (7)

tests/routes/desktop_browser/test_cookie_jar.py
tests/routes/desktop_browser/test_cookie_store.py
tests/routes/desktop_browser/test_proxy_fetch.py
tinyagentos/app.py
tinyagentos/routes/desktop_browser/cookie_jar.py
tinyagentos/routes/desktop_browser/proxy.py
tinyagentos/routes/desktop_browser/store.py

📝 Walkthrough

Walkthrough

This PR adds the lxml dependency and implements a complete browser proxy pipeline for the desktop application. It includes persistent browser storage initialization, cookie-jar management, HTML URL rewriting/injection, a copilot script asset, and comprehensive test coverage for all new functionality.

Changes

Browser Proxy Feature Implementation

Layer / File(s)	Summary
Dependencies & Storage Setup `pyproject.toml`, `tinyagentos/app.py`, `tests/conftest.py`	Added `lxml>=5.0.0` dependency. App lifespan now initializes `BrowserStore` and `BrowserCookieStore` with proper teardown. Test `client` fixture depends on `tmp_data_dir` and instantiates browser stores for test isolation.
Core Proxy Components `tinyagentos/routes/desktop_browser/cookie_jar.py`, `tinyagentos/routes/desktop_browser/rewriter.py`, `tinyagentos/routes/desktop_browser/injector.py`	Cookie-jar adapter bridges `httpx.Cookies` with async `BrowserCookieStore` for per-request load/persist. HTML rewriter parses DOM and rewrites URL attributes, `srcset`, CSS `url(...)`, and `meta http-equiv="refresh"`. Injector adds copilot script and WebSocket meta to HTML head.
Proxy Route & Assets `tinyagentos/routes/desktop_browser/proxy.py`, `tinyagentos/routes/desktop_browser/copilot.js`, `tinyagentos/routes/desktop_browser/profile.py`, `tinyagentos/routes/desktop_browser/store.py`	Proxy route replaces 501 stub with full pipeline: profile bootstrap, SSRF validation, cookie loading, upstream fetch with manual redirect walking, cookie persistence, HTML rewrite/injection with strict CSP, and header stripping. Static endpoint serves copilot.js stub. Profile module ensures default profiles and validates existence. Store `add_profile` made idempotent with `INSERT OR IGNORE`.
Test Coverage `tests/routes/desktop_browser/test_cookie_jar.py`, `tests/routes/desktop_browser/test_injector.py`, `tests/routes/desktop_browser/test_profile.py`, `tests/routes/desktop_browser/test_proxy_fetch.py`, `tests/routes/desktop_browser/test_proxy_shell.py`, `tests/routes/desktop_browser/test_rewriter.py`, `tests/routes/desktop_browser/test_store_tenancy.py`	Comprehensive async pytest coverage for cookie-jar load/persist, HTML injection idempotency, profile CRUD, full proxy pipeline (HTML/non-HTML/cookies/redirects), SSRF gating, static assets, HTML URL rewriting (attributes, srcset, CSS, meta-refresh), and per-user store isolation. Updated store tenancy test from `IntegrityError` expectation to `INSERT OR IGNORE` behavior.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as Proxy Route<br/>(proxy.py)
    participant Jar as Cookie Jar<br/>(cookie_jar.py)
    participant Store as Browser Store<br/>(store.py)
    participant Upstream as Upstream Server
    participant Rewriter as HTML Rewriter<br/>(rewriter.py)
    participant Injector as Head Injector<br/>(injector.py)

    Client->>Proxy: GET /api/desktop/browser/proxy?url=X&profile_id=Y
    Proxy->>Store: ensure_default_profiles(user_id)
    Proxy->>Store: get_profile_or_404(user_id, profile_id)
    Proxy->>Proxy: SSRF validation on URL
    Proxy->>Jar: load_jar_for_request(user_id, profile_id, host)
    Jar->>Store: list_cookies(user_id, profile_id, host)
    Store-->>Jar: cookies[]
    Jar-->>Proxy: httpx.Cookies jar
    Proxy->>Upstream: fetch(url, cookies=jar)
    Upstream-->>Proxy: response (HTML/binary)
    Proxy->>Jar: persist_response_cookies(response)
    Jar->>Store: set_cookie(user_id, profile_id, cookie)
    Store-->>Jar: ✓
    alt response is HTML
        Proxy->>Rewriter: rewrite_html(html, base_url, proxy_fn)
        Rewriter-->>Proxy: rewritten_html
        Proxy->>Injector: inject_into_head(html, ws_url)
        Injector-->>Proxy: injected_html
        Proxy->>Proxy: Apply strict CSP header
    end
    Proxy->>Proxy: Strip set-cookie & CSP headers
    Proxy-->>Client: response (HTML/binary, no cookies)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(browser): SSRF guard + proxy endpoint shell (PR 2/10) #301: Main PR extends the proxy endpoint structure and SSRF/CSP handling introduced in #301; directly related implementation at the proxy route level.
feat(projects): Beads bridge — A2A coordination + JSONL snapshot #267: Both PRs modify tinyagentos/app.py lifespan to initialize and tear down additional app.state services; shares the same lifecycle integration pattern.
feat(browser): backend skeleton + multi-user storage tenancy (PR 1/10) #300: Main PR builds on the desktop-browser backend framework from #300; adds storage, cookie handling, and proxy routing infrastructure to the same module.

Poem

🐰 A proxy danced through the WebSocket mist,
With cookies tucked in a SQLite twist,
HTML rewrites and scripts injected with care,
Upstream responses returned fair and square!
lxml parsed, profiles took shape—
The browser's journey found its escape! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.92% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: implementing the second generation of the browser proxy with real fetching, lxml-based HTML rewriting, and cookie jar functionality.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/browser-pr-3-rewriter-cookie-jar-profile

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 36 minutes and 9 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

tests/routes/desktop_browser/test_proxy_fetch.py (1)
117-142: ⚡ Quick win

This doesn't verify persistence yet.

The test proves header stripping, but not that Set-Cookie was actually written to the server-side jar. A broken persist_response_cookies() path would still pass. Follow this with a second proxied request and assert the stored cookie is replayed upstream, or inspect app.state.browser_cookie_store directly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routes/desktop_browser/test_proxy_fetch.py` around lines 117 - 142, The
test only checks header stripping but not that persist_response_cookies actually
stored cookies; after the first proxied GET to "/api/desktop/browser/proxy" (in
test_persists_set_cookie_to_jar) either make a second proxied request to the
same upstream URL and assert the upstream mock receives a "Cookie:
session=abc123" header, or directly inspect app.state.browser_cookie_store (or
the server-side cookie jar object used by persist_response_cookies) to assert it
contains the "session=abc123" entry; update the test to call the proxy twice (or
examine app.state.browser_cookie_store) and add the appropriate assertion so a
broken persist_response_cookies path will fail.
tests/routes/desktop_browser/test_proxy_shell.py (1)
107-122: ⚡ Quick win

Make this assertion deterministic.

assert resp.status_code != 501 still passes on unrelated regressions like a profile/bootstrap failure or a route error. Mock the upstream fetch and assert a concrete proxy outcome here, so this test only passes when the request actually reaches the fetch pipeline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routes/desktop_browser/test_proxy_shell.py` around lines 107 - 122, The
test currently only patches DNS and asserts resp.status_code != 501 which is
nondeterministic; instead mock the upstream fetch call used by the
desktop_browser proxy handler so the route actually reaches the fetch pipeline
and returns a predictable response. In test_valid_request_now_attempts_fetch
patch the real fetch function the route calls (e.g., the module-level fetch
helper or the HTTP client method used by tinyagentos.routes.desktop_browser —
for example patch tinyagentos.routes.desktop_browser.fetch_url or
httpx.AsyncClient.get if that’s used) to return a fixed successful Response
(status 200 and known body), keep the existing getaddrinfo patch, then assert
resp.status_code == 200 (and optionally assert the response body matches the
mocked body) so the test only passes when the proxy forwards to the fetch
pipeline.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tinyagentos/app.py`:
- Around line 317-323: The code uses a hardcoded SQLCipher key
(_placeholder_cookie_key) when constructing BrowserCookieStore, which makes all
installs share the same encryption key; instead, generate or retrieve a unique
per-installation key and pass that to BrowserCookieStore: e.g., on first run
create a strong random key (use secrets.token_bytes/ token_hex or a KDF) and
persist it in a restricted-permission local file or the OS keyring, on
subsequent runs read that stored key, and fall back to an environment variable
only for recoverability; update the initialization flow around
BrowserCookieStore.init() and app.state.browser_cookie_store to use the
retrieved/generated key rather than the hardcoded _placeholder_cookie_key.

In `@tinyagentos/routes/desktop_browser/cookie_jar.py`:
- Line 80: The current line using "if cookie.expires" treats expires==0 as falsy
and stores None; change the check to "cookie.expires is not None" when computing
expires_at so zero-expiry is preserved, and add logic in the cookie persistence
path (where BrowserCookieStore.set_cookie is called) to detect when expires_at
is in the past and call a new BrowserCookieStore.delete_cookie(user_id,
profile_id, host, path, name) instead of set_cookie; also modify
BrowserCookieStore.get_cookies to filter out expired rows (e.g. add "AND
(expires_at IS NULL OR expires_at > current_unix_time)" semantics) so expired
cookies are not returned.
- Around line 42-44: The SQL match in BrowserCookieStore.get_cookies currently
uses an exact equality (host = ?) so a cookie stored for "github.com" won't
match requests to "gist.github.com"; change the WHERE clause to include a
subdomain match instead of only equality—e.g. WHERE (host = :stored_host OR
:request_host LIKE '%' || host) so a stored host like "github.com" will match
request_host "gist.github.com" but not vice versa; keep the existing
host.lstrip(".") handling in load_jar_for_request and ensure parameter ordering
matches the new predicate in get_cookies (update BrowserCookieStore.get_cookies
in store.py and any SQL-binding code accordingly), and add/adjust tests to cover
domain-scoped cookies matching subdomains.

In `@tinyagentos/routes/desktop_browser/proxy.py`:
- Around line 114-126: The code currently calls http.get(current_url) and
accesses response.content (e.g., variable _resp) which buffers the whole
upstream body; change to streaming requests using http.stream("GET",
current_url, cookies=jar) and inspect headers from the response context
(Content-Type) before consuming the body; for non-HTML responses forward the
stream to the client in chunks (using response.aiter_bytes()/iter_bytes())
without storing into memory, and for HTML responses read into a bounded
accumulator up to a defined MAX_HTML_BYTES (reject or error if exceeded) before
performing rewrite/injection; apply the same streaming-and-bounded-read pattern
to the other occurrences that use response.content (lines referenced around
192-194 and 211-217), and keep unique symbols in mind: httpx.AsyncClient,
http.stream, current_url, load_jar_for_request, _resp, and any downstream code
that expects the full body so it can work with the bounded HTML payload only.

---

Nitpick comments:
In `@tests/routes/desktop_browser/test_proxy_fetch.py`:
- Around line 117-142: The test only checks header stripping but not that
persist_response_cookies actually stored cookies; after the first proxied GET to
"/api/desktop/browser/proxy" (in test_persists_set_cookie_to_jar) either make a
second proxied request to the same upstream URL and assert the upstream mock
receives a "Cookie: session=abc123" header, or directly inspect
app.state.browser_cookie_store (or the server-side cookie jar object used by
persist_response_cookies) to assert it contains the "session=abc123" entry;
update the test to call the proxy twice (or examine
app.state.browser_cookie_store) and add the appropriate assertion so a broken
persist_response_cookies path will fail.

In `@tests/routes/desktop_browser/test_proxy_shell.py`:
- Around line 107-122: The test currently only patches DNS and asserts
resp.status_code != 501 which is nondeterministic; instead mock the upstream
fetch call used by the desktop_browser proxy handler so the route actually
reaches the fetch pipeline and returns a predictable response. In
test_valid_request_now_attempts_fetch patch the real fetch function the route
calls (e.g., the module-level fetch helper or the HTTP client method used by
tinyagentos.routes.desktop_browser — for example patch
tinyagentos.routes.desktop_browser.fetch_url or httpx.AsyncClient.get if that’s
used) to return a fixed successful Response (status 200 and known body), keep
the existing getaddrinfo patch, then assert resp.status_code == 200 (and
optionally assert the response body matches the mocked body) so the test only
passes when the proxy forwards to the fetch pipeline.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 13054bc1-f2cd-4c72-b2b2-0bb8f4fb3bfe

📥 Commits

Reviewing files that changed from the base of the PR and between 74f44c8 and 663d94a.

📒 Files selected for processing (17)

pyproject.toml
tests/conftest.py
tests/routes/desktop_browser/test_cookie_jar.py
tests/routes/desktop_browser/test_injector.py
tests/routes/desktop_browser/test_profile.py
tests/routes/desktop_browser/test_proxy_fetch.py
tests/routes/desktop_browser/test_proxy_shell.py
tests/routes/desktop_browser/test_rewriter.py
tests/routes/desktop_browser/test_store_tenancy.py
tinyagentos/app.py
tinyagentos/routes/desktop_browser/cookie_jar.py
tinyagentos/routes/desktop_browser/copilot.js
tinyagentos/routes/desktop_browser/injector.py
tinyagentos/routes/desktop_browser/profile.py
tinyagentos/routes/desktop_browser/proxy.py
tinyagentos/routes/desktop_browser/rewriter.py
tinyagentos/routes/desktop_browser/store.py

coderabbitai · 2026-05-03T22:54:46Z

+    rows = await cookie_store.get_cookies(
+        user_id=user_id, profile_id=profile_id, host=host.lstrip("."),
+    )


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Domain-scoped cookies won't be sent to subdomains — breaks auth for many real sites.

get_cookies uses an exact SQL WHERE host = ? match. A cookie stored for github.com (after stripping the leading dot from .github.com) will never be found when the proxy fetches gist.github.com or api.github.com. Per RFC 6265, Domain=github.com must apply to all subdomains, which is standard practice for auth cookies on virtually every major site.

The fix belongs in BrowserCookieStore.get_cookies (store.py), but since load_jar_for_request is the call site establishing this pattern, tracking the gap here:

🐛 Proposed fix in `BrowserCookieStore.get_cookies`

- "WHERE user_id = ? AND profile_id = ? AND host = ?", - (user_id, profile_id, host), + "WHERE user_id = ? AND profile_id = ? AND (host = ? OR ? LIKE '%.' || host)", + (user_id, profile_id, host, host),

'gist.github.com' LIKE '%.github.com' → True, so a stored github.com cookie is correctly included in requests to any subdomain. The reverse is not matched (a gist.github.com cookie is never sent to github.com), which is correct per spec.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tinyagentos/routes/desktop_browser/cookie_jar.py` around lines 42 - 44, The SQL match in BrowserCookieStore.get_cookies currently uses an exact equality (host = ?) so a cookie stored for "github.com" won't match requests to "gist.github.com"; change the WHERE clause to include a subdomain match instead of only equality—e.g. WHERE (host = :stored_host OR :request_host LIKE '%' || host) so a stored host like "github.com" will match request_host "gist.github.com" but not vice versa; keep the existing host.lstrip(".") handling in load_jar_for_request and ensure parameter ordering matches the new predicate in get_cookies (update BrowserCookieStore.get_cookies in store.py and any SQL-binding code accordingly), and add/adjust tests to cover domain-scoped cookies matching subdomains.

coderabbitai · 2026-05-03T22:54:46Z

+        async with httpx.AsyncClient(
+            follow_redirects=False, timeout=_HOP_TIMEOUT,
+        ) as http:
+            for hop in range(_MAX_REDIRECTS + 1):
+                host = urlparse(current_url).hostname or ""
+
+                jar = await load_jar_for_request(
+                    cookie_store, user_id=user_id, profile_id=profile_id, host=host,
+                )
+
+                try:
+                    _resp = await http.get(current_url, cookies=jar)
+                except httpx.HTTPError as e:


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid buffering arbitrary upstream bodies in memory.

http.get() reads the full upstream response before we know whether it's HTML, and both branches then use response.content. For a user-supplied proxy URL, a large download or media asset becomes an unbounded memory spike on the server. Non-HTML responses should be streamed, and HTML responses should have a size cap before rewrite/injection.

Also applies to: 192-194, 211-217

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 117-117: Loop control variable hop not used within loop body

Rename unused hop to _hop

(B007)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tinyagentos/routes/desktop_browser/proxy.py` around lines 114 - 126, The code currently calls http.get(current_url) and accesses response.content (e.g., variable _resp) which buffers the whole upstream body; change to streaming requests using http.stream("GET", current_url, cookies=jar) and inspect headers from the response context (Content-Type) before consuming the body; for non-HTML responses forward the stream to the client in chunks (using response.aiter_bytes()/iter_bytes()) without storing into memory, and for HTML responses read into a bounded accumulator up to a defined MAX_HTML_BYTES (reject or error if exceeded) before performing rewrite/injection; apply the same streaming-and-bounded-read pattern to the other occurrences that use response.content (lines referenced around 192-194 and 211-217), and keep unique symbols in mind: httpx.AsyncClient, http.stream, current_url, load_jar_for_request, _resp, and any downstream code that expects the full body so it can work with the bounded HTML payload only.

coderabbitai · 2026-05-03T22:54:46Z

+                jar = await load_jar_for_request(
+                    cookie_store, user_id=user_id, profile_id=profile_id, host=host,
+                )
+
+                try:
+                    _resp = await http.get(current_url, cookies=jar)
+                except httpx.HTTPError as e:
+                    _logger.info("browser proxy fetch error: err=%s", e)
+                    return None
+
+                # Persist any cookies set by this hop
+                await persist_response_cookies(
+                    cookie_store, _resp.cookies,
+                    user_id=user_id, profile_id=profile_id,
+                )


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Host-only cookie loading loses real browser cookie semantics.

This call path only loads cookies by exact host, but the current jar helpers persist .example.com as example.com and then look up by exact host. That means cookies scoped to a parent domain will not survive redirects across subdomains, and the loader has no scheme input to enforce secure-only replay rules on later http:// hops. Common login flows like www.example.com -> app.example.com will break, and scheme downgrades can replay cookies too broadly. Please move this integration to full cookie matching semantics before merge. Cross-file evidence: tinyagentos/routes/desktop_browser/cookie_jar.py:29-54 and tinyagentos/routes/desktop_browser/cookie_jar.py:57-82.

… cap Five Major findings from CodeRabbit's PR 302 review: - app.py: replace universal "0"*64 SQLCipher key with a per-install random key persisted to data_dir/browser_cookie_key.hex (mode 0o600) generated via secrets.token_hex(32). Override via env var TAOS_BROWSER_COOKIE_KEY_HEX for recovery. Removes the at-rest protection regression where every install shared the same key. - store.py: get_cookies now matches subdomains per RFC 6265 ((host = ? OR ? LIKE '%.' || host)), so Set-Cookie: Domain=github.com reaches gist.github.com / api.github.com etc. — fixes auth on every major site. Also adds expiry filter so expired rows aren't returned. - store.py: new delete_cookie method for explicit cookie removal. - cookie_jar.py: `if cookie.expires` → `cookie.expires is not None` so server-sent expires=0 (cookie deletion) is preserved instead of being silently treated as a session cookie. When persisting a past-dated cookie, route to delete_cookie instead of set_cookie so logout actually invalidates the session. - proxy.py: 10 MB response size cap before HTML rewrite / non-HTML pass-through. Without it, a user-supplied URL pointing at a large download would OOM the proxy. +6 regression tests covering subdomain cascade (both directions), expired-cookie filtering, explicit delete_cookie, server-driven cookie deletion via past expiry, and the response size cap.

jaylfc added 8 commits May 3, 2026 22:56

build: add lxml for browser proxy DOM rewriter

ef86c71

feat(browser): profile CRUD + default Personal/Work bootstrap

9166be3

feat(browser): httpx cookie-jar adapter wrapping BrowserCookieStore

ee305a0

feat(browser): lxml-based DOM URL rewriter (href/src/srcset/style/meta)

ac3f1ee

feat(browser): head injector for copilot.js + WS meta tag

46057a9

feat(browser): copilot.js stub (full impl in PR 6)

e809823

feat(browser): proxy v2 — real fetch + lxml rewrite + injection + coo…

cebc8f2

…kie jar

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

jaylfc merged commit 13f42d6 into master May 3, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(browser): proxy v2 — real fetch + lxml rewrite + cookie jar (PR 3/10)#302

feat(browser): proxy v2 — real fetch + lxml rewrite + cookie jar (PR 3/10)#302
jaylfc merged 9 commits into
masterfrom
feat/browser-pr-3-rewriter-cookie-jar-profile

jaylfc commented May 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

coderabbitai Bot May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jaylfc commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this does not land

Test Plan

Spec

Notes for reviewers

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jaylfc commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading