skill(ob-routines): record → compile → replay browser routines#62
Merged
softpudding merged 10 commits intoApr 18, 2026
Merged
Conversation
3 tasks
The skill is now symlinked into ~/.claude/skills/open-browser/ for global use. Update every `python3 skill/claude/open-browser/...` reference to `python3 ~/.claude/skills/open-browser/...` so the same command works from any project's CWD (including inside the OpenBrowser repo, where the symlink still resolves back here). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…411883e78527b1915fa8c4 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…play) Introduces the ob-routines skill (alias for openbrowser-routines) for capturing, compiling, and replaying named Chrome workflows. Previously lived only in ~/.claude/skills/routines/; now versioned under skill/claude/ob-routines/ so it can be installed via symlink alongside open-browser. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In routine-replay mode, where the compiled SOP gives the agent precise element keywords, the 2-phase click/select/keyboard_input confirmation round-trip and the 3-frame screenshot history both pay for ambiguity that does not exist. - BrowserExecutor now tracks the most recent highlight result per conversation. When the agent targets the unique element that highlight just returned, click/select/keyboard_input skip the pending-confirmation round-trip and execute directly. Falls back to 2PC in any other case. - get_context_image_window(routine_replay=True) returns 1, overriding the default of 3 for replay conversations only. - ob-routines SKILL.md: tighten /ob-routines new to ask only for the one-line goal and defer URL/site/parameter questions to the compiler. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tab init's two heaviest phases share the same shape: per-element loops that re-do work the previous step already paid for. Cut both. Scanner (highlight-detection.injected.js): wrap collectHighlightCandidates in withScanLayoutCache, which monkey-patches Element.prototype.getBoundingClientRect, SVGGraphicsElement.prototype.getBoundingClientRect, window.getComputedStyle, and Document.prototype.elementsFromPoint with per-scan WeakMap/Map caches. The scan runs in one synchronous Runtime.evaluate, so layout cannot change mid-task and caching is safe; originals are restored in finally. Also skip inert tags (script/style/meta/...) before the first layout read. Pagination (collision-detection.ts): SelectedSpatialIndex (96px grid) keyed on union(bbox, labelBBox) of placed elements. isPlacementFeasible now iterates only nearby placed elements via nearbySelectedFor, which queries by inflate (union(candidate.bbox, candidate.labelBBox), CLEARANCE) — covering all four collision tests. chooseLeastBlockingPlacement also uses an "influence rect" to skip re-evaluating spatially-far future candidates when a hypothetical placement cannot affect them. Measured (best run, fresh tab init): - finviz.com (349 elements): 17.8s -> 13.7s (-23%) - bluebook mock (50): 6.3s -> 5.4s (-14%) - techforum mock (34): 4.3s -> 3.9s (-11%) - 16 mock sites aggregate: -4% to -14% Correctness: - 181/181 extension unit tests pass. - Strict integration check (selector + type + labelPosition + bbox + element ORDER) passes on all 16 deterministic mock sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pagination win revealed that scan-phase resolve was the new bottleneck: on finviz the resolve phase alone was 6.1s of a 6.5s scan. Per candidate, resolveClickableCandidate walks up to 5 ancestors, each calling isClickableCandidate, which calls hasExplicitClickableAncestor that walks ALL ancestors back to body, calling getSemanticClickableSignal at each. For deep DOM (finviz tables) the same elements were classified dozens of times per scan. Add per-scan WeakMap memoization (cleared by withScanLayoutCache) for the classifiers that are pure functions of element + DOM state: - getSemanticClickableSignal - isClickableCandidate - getBaseClickableSignal - hasExplicitClickableAncestor - getElementTextForDetection (textContent walk) - getElementSearchText Also add scan_stats / scan_times to the response payload so harness/tooling can attribute time per phase without parsing console output. Measured (best run, finviz.com/screener.ashx, ~349 candidates): - in-page scan: 6537ms -> 585ms (-91%, ~11x) - pagination: 397ms -> 300ms (already optimized in prior commit) - end-to-end: 17787ms -> 4975ms (-72%, ~3.6x) Resolve-phase breakdown after caching: 6121ms -> 51ms. Correctness: 181/181 unit tests pass. Strict integration check (selector, type, labelPosition, bbox, element ORDER) passes on all 16 deterministic mock sites — same elements, same labels, same order. finviz_real returns identical 336/6/138 element/page/page1 counts. Caching is safe because the scan runs in one synchronous Runtime.evaluate call and these classifiers depend only on DOM state that cannot mutate during the scan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
45a103e to
a9c0c7c
Compare
perf(highlight): cut tab init from ~20s to ~5s on heavy pages
1a8aa5c
into
feat/image-input-and-file-upload
4 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New `ob-routines` skill: record a browser session, compile it into a routine SOP, and replay it efficiently. Also bumps the openhands-sdk dependency required by the replay path and includes a small open-browser skill doc fix.
Commits (oldest → newest):
What's in the skill
Test plan
Stack
This is PR 2 of 3:
After #61 merges to main, retarget this PR to `main`.
🤖 Generated with Claude Code