Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ standalone/src-tauri/binaries/
standalone/src-tauri/gen/
standalone/dist/
standalone/sidecar/dor-cli/
standalone/sidecar/iframe-proxy.cjs
standalone/sidecar/node_modules/
standalone/node_modules/

Expand Down
81 changes: 48 additions & 33 deletions docs/specs/dor-iframe.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ Dormouse-served content, which is the one capability the raw iframe lacked — a
from it the surface gains a keyboard side-channel for its global leader chord, an
accurate focus model, and real error pages.

> Status: **works for loopback dev servers** (implemented on the VS Code host).
> Arbitrary web browsing is still better served by the **agent-browser** surface
> (`dor ab`, see [dor-agent-browser.md](dor-agent-browser.md)): the iframe surface
> proxies `http://` upstreams (loopback dev servers are overwhelmingly plain
> http), defers `https://`, and routes a remote that refuses framing to an error
> page pointing at `dor ab`.
> Status: **works for loopback dev servers** in hosts that can run the shared
> Node proxy (VS Code extension host and standalone/Tauri sidecar). Arbitrary
> web browsing is still better served by the **agent-browser** surface (`dor ab`,
> see [dor-agent-browser.md](dor-agent-browser.md)): the iframe surface proxies
> `http://` upstreams (loopback dev servers are overwhelmingly plain http),
> defers `https://`, and routes a remote that refuses framing to an error page
> pointing at `dor ab`.

## The CLI → surface

Expand All @@ -42,9 +43,9 @@ isn't proxyable (e.g. an `https://` URL) it shows an actionable message instead.

This is the **substrate** the surface is built on. Instead of pointing the
`<iframe>` at the target, Dormouse points it at a loopback proxy
(`vscode-ext/src/iframe-proxy-host.ts`) that fetches the target and serves it
back. The moment Dormouse serves the bytes, two things become possible that the
raw iframe cannot do:
(`lib/src/host/iframe-proxy.ts`) that fetches the target and serves it back. The
moment Dormouse serves the bytes, two things become possible that the raw iframe
cannot do:

1. **Inject a keyboard side-channel** so Dormouse's global leader chord keeps
working inside the frame (the technique VS Code uses for its own webviews).
Expand Down Expand Up @@ -85,6 +86,10 @@ forwarder. The difference: this proxy speaks **HTTP** (it parses and rewrites
responses) and passes through **WebSocket upgrades** (dev-server HMR,
openvscode-server's connection).

Source of truth: `lib/src/host/iframe-proxy.ts` owns the shared HTTP/WebSocket
server, while `lib/src/host/iframe-proxy-rewrite.ts` owns the dependency-free
policy, HTML instrumentation, framing checks, and served error pages.

- **Per-grant dedicated loopback server.** Each grant gets its own ephemeral
`http.Server` bound to `127.0.0.1:0`, fronting exactly **one** fixed upstream.
The grant's *origin* is the grant. Two consequences, and they are deliberate
Expand All @@ -109,30 +114,38 @@ openvscode-server's connection).
off-proxy. Non-HTML passes through (framing/hop-by-hop headers still stripped).
The initial framed proxy URL preserves the target's path, query, and fragment;
the fragment remains browser-only and is not sent on upstream HTTP requests.
HTML injection streams: the proxy buffers only until `</head>`, `<body>`, or a
bounded prefix cap, instruments that prefix, then pipes the rest of the
upstream response through without waiting for the full document.
- **WebSocket passthrough.** Upgrades are forwarded as a raw byte pipe once the
upgrade head is rewritten (`Host`/`Origin` → upstream), exactly like the stream
relay.
- **Anti-framebust.** The `<iframe>` uses a `sandbox` without
`allow-top-navigation`, so a tool's `if (top !== self) top.location = …` cannot
navigate the Wall away.

### The keyboard side-channel (resolves #1)
### The iframe shim message channel (resolves #1 and proxied click adoption)

A fixed, Dormouse-owned script — like agent-browser's `EDIT_SCRIPTS`, never
user-supplied, so it is not an eval vector — injected inline before `</head>`. It
reclaims **only** the reserved leader chord (dual-tap ⌘ / ⇧, the same detection as
`handle-dual-tap.ts`) and `postMessage`s it to the parent; **every other keystroke
flows to the tool untouched**. The embedded tool (a code editor, a VS-Code-web
workbench) keeps full keyboard interactivity; Dormouse keeps its one global chord.
user-supplied, so it is not an eval vector — injected inline before `</head>`.
It posts only Dormouse-owned control messages to the parent:

- `leader`: the reserved leader chord (dual-tap ⌘ / ⇧, the same detection as
`handle-dual-tap.ts`).
- `pointerdown`: genuine user pointerdown inside the cross-origin frame, so the
panel can adopt the click as pane selection + passthrough entry.

Every other keystroke and pointer event flows to the tool untouched. The
embedded tool (a code editor, a VS-Code-web workbench) keeps full keyboard
interactivity; Dormouse keeps its one global chord and a click-adoption signal.

The Wall already owns a capturing `window` keydown listener
(`use-wall-keyboard.ts`); it gains a `message` listener that validates
`event.origin` against the live proxy grants (`lib/src/lib/iframe-proxy-registry.ts`)
and feeds the forwarded chord into the same dispatch the in-document dual-tap
would (`exitTerminalMode`) — no synthesized `KeyboardEvent` round-trip.

> Deviation from the original sketch: the shim forwards **only** the leader, not
> `focus`/`blur`. The focus model below needs no message channel.
would (`exitTerminalMode`) — no synthesized `KeyboardEvent` round-trip. The
iframe panel separately listens for the same validated proxy origin and treats a
`pointerdown` message as `onClickPanel(api.id)`.

### Accurate focus model (resolves #2 and #3)

Expand All @@ -143,11 +156,12 @@ would (`exitTerminalMode`) — no synthesized `KeyboardEvent` round-trip.
inactive, so headers/attention stay live when an iframe takes focus.
- **#3** — `IframePanel` registers a focus handle (`registerSurfaceFocusHandle` in
`terminal-lifecycle.ts`) so `focusSession` focuses the frame element like any
other surface. And because clicking *into* a cross-origin frame doesn't bubble a
`mousedown` to the pane, `IframePanel` adopts "the frame took focus" (window
`blur` while our iframe is `document.activeElement` and the app still has focus)
as entering the pane — so mode/selection stay consistent and the leader chord
round-trips back out.
other surface. Because clicking *into* a cross-origin frame doesn't bubble a
`mousedown` to the pane, proxied frames adopt the shim's validated
`pointerdown` message as entering the pane. The raw fallback has no shim, so it
preserves the older focus heuristic: window `blur` while our iframe is
`document.activeElement` and the app still has focus. Both paths keep
mode/selection consistent when the frame owns focus.

### Real error signals (resolves #4)

Expand Down Expand Up @@ -186,13 +200,16 @@ createIframeProxyUrl?(targetUrl: string): Promise<
>;
```

VS Code implements it in the extension host (`iframe-proxy-host.ts`), routed via
`message-router.ts` / `message-types.ts` and the `vscode-adapter.ts` request/
response pair. **The proxy is needed on every host** — even where a Tauri webview
could frame `http://127.0.0.1` directly for origin reasons, injection still
requires controlling the bytes — unlike the agent-browser relay, which was a
VS-Code-only origin fix. Hosts with no process to run one (the web host) omit the
method and the panel falls back to a raw, uninstrumented `<iframe>`.
VS Code implements it in the extension host (`vscode-ext/src/iframe-proxy-host.ts`),
routed via `message-router.ts` / `message-types.ts` and the `vscode-adapter.ts`
request/response pair. Standalone implements the same adapter method through
`standalone/src/tauri-adapter.ts` → `iframe_create_proxy_url` in
`standalone/src-tauri/src/lib.rs` → the sidecar's `iframe:createProxyUrl` command,
which loads the bundled shared proxy. **The proxy is needed on every host** —
even where a Tauri webview could frame `http://127.0.0.1` directly for origin
reasons, injection still requires controlling the bytes. Hosts with no process
to run one (the web host) omit the method and the panel falls back to a raw,
uninstrumented `<iframe>`.

With the proxy, the VS Code webview CSP (`vscode-ext/src/webview-html.ts`) narrows
from the old broad `frame-src http: https:` to the loopback proxy origin only:
Expand Down Expand Up @@ -233,8 +250,6 @@ user's own `dor iframe <url>`.
absolute `http://localhost:5173/…` (notably Vite's HMR `ws://localhost:5173/…`)
connects straight to the upstream — uninstrumented, though harmless for loopback
since the browser can reach it.
- **Streaming SSR is buffered.** The proxy buffers an HTML response fully before
injecting the shim, adding latency for streamed responses.
- **No teardown-on-kill hook yet.** A killed iframe surface's proxy server is
reaped by the idle sweep, not immediately on kill. (The shared teardown hook is
tracked under Path 2 below.)
Expand Down
5 changes: 5 additions & 0 deletions lib/src/components/Wall.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -928,6 +928,10 @@ export function Wall({
tabComponent: 'surface',
title,
params,
// Keep iframes mounted across (de)activation — dockview's default
// onlyWhenVisible renderer detaches/reattaches panel DOM, and moving an
// <iframe> in the DOM reloads it (docs/specs/dor-iframe.md).
renderer: component === 'iframe' ? 'always' : undefined,
position: { referencePanel: referencePanel.id, direction: 'within' },
});
disposeSession(reference.id);
Expand All @@ -945,6 +949,7 @@ export function Wall({
tabComponent: 'surface',
title,
params,
renderer: component === 'iframe' ? 'always' : undefined,
position: { referencePanel: referencePanel.id, direction: dockDirection },
});
selectPane(newId);
Expand Down
99 changes: 99 additions & 0 deletions lib/src/components/wall/IframePanel.test.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
/**
* @vitest-environment jsdom
*/
import { act } from 'react';
import { createRoot, type Root } from 'react-dom/client';
import type { IDockviewPanelProps } from 'dockview-react';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { FakePtyAdapter, setPlatform } from '../../lib/platform';
import { IframePanel } from './IframePanel';
import { WallActionsContext, type WallActions } from './wall-context';

globalThis.IS_REACT_ACT_ENVIRONMENT = true;

function stubActions(overrides: Partial<WallActions> = {}): WallActions {
return {
onKill: vi.fn(),
onMinimize: vi.fn(),
onAlertButton: vi.fn(() => 'noop'),
onToggleTodo: vi.fn(),
onSplitH: vi.fn(),
onSplitV: vi.fn(),
onZoom: vi.fn(),
onClickPanel: vi.fn(),
onFocusPane: vi.fn(),
onStartRename: vi.fn(),
onFinishRename: vi.fn(() => ({ accepted: true })),
onCancelRename: vi.fn(),
...overrides,
};
}

function panelProps(id: string): IDockviewPanelProps<{ url: string }> {
return {
api: { id, title: 'Raw iframe' },
params: { url: 'http://example.test/app' },
} as unknown as IDockviewPanelProps<{ url: string }>;
}

let container: HTMLDivElement;
let root: Root;

beforeEach(() => {
setPlatform(new FakePtyAdapter());
container = document.createElement('div');
document.body.appendChild(container);
root = createRoot(container);
});

afterEach(() => {
act(() => root.unmount());
container.remove();
vi.restoreAllMocks();
});

async function renderPanel(actions: WallActions): Promise<HTMLIFrameElement> {
await act(async () => {
root.render(
<WallActionsContext.Provider value={actions}>
<IframePanel {...panelProps('iframe-raw')} />
</WallActionsContext.Provider>,
);
});

const iframe = container.querySelector('iframe');
if (!iframe) throw new Error('missing iframe');
return iframe;
}

describe('IframePanel', () => {
it('adopts clicks into the raw iframe fallback via window blur focus', async () => {
const onClickPanel = vi.fn();
const actions = stubActions({ onClickPanel });
const iframe = await renderPanel(actions);

vi.spyOn(document, 'hasFocus').mockReturnValue(true);
vi.spyOn(document, 'activeElement', 'get').mockReturnValue(iframe);

act(() => {
window.dispatchEvent(new Event('blur'));
});

expect(onClickPanel).toHaveBeenCalledWith('iframe-raw');
});

it('does not adopt a raw iframe blur when the app itself lost focus', async () => {
const onClickPanel = vi.fn();
const actions = stubActions({ onClickPanel });
const iframe = await renderPanel(actions);

vi.spyOn(document, 'hasFocus').mockReturnValue(false);
vi.spyOn(document, 'activeElement', 'get').mockReturnValue(iframe);

act(() => {
window.dispatchEvent(new Event('blur'));
});

expect(onClickPanel).not.toHaveBeenCalled();
});
});
63 changes: 47 additions & 16 deletions lib/src/components/wall/IframePanel.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -84,24 +84,31 @@ export function IframePanel({ api, params }: IDockviewPanelProps<IframePanelPara
return registerProxyOrigin(proxyOrigin);
}, [proxyOrigin]);

// Register a focus handle so onClickPanel → enterTerminalMode can focus the
// frame like any other surface (spec → "#3"). Focusing the element moves
// keyboard focus into the frame; the shim then reports focus back to the Wall.
// A cross-origin click reaches only the frame, so the Wall never sees the
// mousedown — and on WebKit the iframe element's own `focus` event doesn't
// fire for it either. The shim posts `pointerdown` from inside the frame;
// adopt it as entering the pane (select + passthrough), exactly like clicking
// any other pane. Only genuine clicks emit `pointerdown`, so command-mode
// arrow navigation never triggers it, and onClickPanel is idempotent for
// repeat clicks. (We can't gate on dockview's `api.isActive`: in a split each
// sole panel is always "active" within its own group.)
useEffect(() => {
if (resolution.kind !== 'proxied' && resolution.kind !== 'raw') return;
return registerSurfaceFocusHandle(api.id, {
focus: () => iframeRef.current?.focus(),
blur: () => iframeRef.current?.blur(),
});
}, [api.id, resolution.kind]);
if (!proxyOrigin) return;
const onMessage = (e: MessageEvent) => {
if (e.origin !== proxyOrigin) return;
if ((e.data as { __dormouse?: unknown } | null)?.__dormouse !== 'pointerdown') return;
actions.onClickPanel(api.id);
};
window.addEventListener('message', onMessage);
return () => window.removeEventListener('message', onMessage);
}, [api, proxyOrigin, actions]);

// Clicking *into* a cross-origin frame doesn't bubble a mousedown to the pane,
// so the onMouseDown below never fires and the surface never enters
// passthrough. Detect the frame taking focus (window blurs while our iframe
// becomes activeElement, app still focused) and adopt it as entering the pane,
// so mode/selection stay consistent and the leader chord can round-trip out.
// Raw fallback frames have no injected shim, but focusing a cross-origin
// iframe still blurs the parent window while the document itself remains
// focused. Adopt that as entering the pane so hosts without a proxy keep the
// same click/focus behavior, albeit without the proxied leader side-channel.
useEffect(() => {
if (resolution.kind !== 'proxied' && resolution.kind !== 'raw') return;
if (resolution.kind !== 'raw') return;
const onWindowBlur = () => {
if (document.hasFocus() && document.activeElement === iframeRef.current) {
actions.onClickPanel(api.id);
Expand All @@ -111,12 +118,36 @@ export function IframePanel({ api, params }: IDockviewPanelProps<IframePanelPara
return () => window.removeEventListener('blur', onWindowBlur);
}, [api.id, resolution.kind, actions]);

// Register a focus handle so onClickPanel → enterTerminalMode can focus the
// frame like any other surface (spec → "#3"), and exitTerminalMode can hand
// focus back. Focusing the element moves keyboard focus into the frame.
useEffect(() => {
if (resolution.kind !== 'proxied' && resolution.kind !== 'raw') return;
return registerSurfaceFocusHandle(api.id, {
// Skip if the frame already holds focus: re-focusing a cross-origin frame
// on WebKit can blank it (the frame is already focused after a click).
focus: () => {
if (document.activeElement !== iframeRef.current) iframeRef.current?.focus();
},
// Pull focus back into the top document so the Wall's window keydown
// listener receives command-mode keys after the leader exits passthrough —
// blurring a cross-origin frame doesn't reliably hand focus back on WebKit.
blur: () => {
iframeRef.current?.blur();
elRef.current?.focus();
},
});
}, [api.id, resolution.kind]);

const src = resolution.kind === 'proxied' || resolution.kind === 'raw' ? resolution.src : '';

return (
<div
ref={elRef}
className={`relative h-full w-full overflow-hidden bg-terminal-bg ${TERMINAL_BOTTOM_RADIUS_CLASS}`}
// tabIndex makes this focusable so the focus handle can park focus here
// (in the top document) when the frame blurs; outline-none hides the ring.
tabIndex={-1}
className={`relative h-full w-full overflow-hidden bg-terminal-bg outline-none ${TERMINAL_BOTTOM_RADIUS_CLASS}`}
// A cross-origin iframe is an out-of-process frame; Chromium maps pointer
// events to it relative to its nearest compositing/containing ancestor.
// Dockview's root (.dv-dockview) sets `contain: layout`, so without this
Expand Down
Loading
Loading