perf(ios): make get text ~80x faster for non-editable elements#632
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 393a490ddc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (fallbackText && !prefersValueForReadableText(node.type ?? '')) { | ||
| return fallbackText; |
There was a problem hiding this comment.
Preserve macOS helper value-first reads
For macOS desktop/frontmost-app reads this early return changes the observable text when an accessibility element has both a title/description and an AXValue: the snapshot stores label as title ?? description ?? value while retaining value, but the macOS helper read path returns AXValue before title/description. In that environment a non-editable control such as an AX checkbox/slider with a label and current value will now return the label from the snapshot instead of the value that get text returned before, because the helper dispatch is skipped for all platforms rather than only the iOS XCUITest path this optimization targets.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed — the fast-path early-return is gated to device.platform === 'ios' on the current branch tip, so macOS helper and Linux keep their value-first backend reads (and Android too). Added test cases asserting the backend read IS still dispatched on android/macos/linux. (The reviewed commit predated the gate.)
7cf6751 to
70dac1f
Compare
Size Report
Startup median (7 runs, lower is better):
Top changed chunks:
|
70dac1f to
ecfe239
Compare
…ration textInputAt used app.descendants(.any).allElementsBoundByIndex (snapshots EVERY element) to find the text input at a point. fill drove this repeatedly: once it has coordinates, resolveTextEntryElement re-runs textInputAt on every verify/repair poll iteration whenever the focused-field reference goes stale (e.g. the Settings search bar repositioning bottom->top), so the full-tree enum dominated fill latency. Query the text-input element types directly (app.textFields/secureTextFields/ searchFields/textViews) instead. Same matches, but XCUITest resolves typed queries without snapshotting the whole tree. Measured (iPhone 17 sim, warm runner): fill 25 chars ~14.5s -> ~4.5s (3.2x), 6/6 exact. Same primitive #632 killed for get text.
d066130 to
9ea185d
Compare
readTextForNode dispatched a coordinate 'read' to the iOS XCUITest runner for every get text, where readTextAt() enumerates the full element tree (allElementsBoundByIndex) — ~20x slower than the snapshot already captured to resolve the node. That re-read only recovers fuller text for editable/expandable inputs (textField/searchField/textView/…); for all other element types the freshly-captured snapshot node text is authoritative. Return the snapshot node text directly for non-editable nodes with non-empty readable text, skipping the round-trip. Measured on iPhone 17 sim: get text on a labeled control drops from ~25s to ~0.3s steady-state. Editable inputs keep the backend re-read (live value can exceed the snapshot).
Review (P1): the fast-path skipped the backend read on Android/Linux/macOS too, but those backends read value-first (macOS helper: AXValue→title→description; Linux similar) whereas snapshot readable text is label-first for non-editables — so skipping their read changed get text output. Restrict the optimization to the iOS XCUITest path (the slow allElementsBoundByIndex re-read it targets). Adds a test asserting non-iOS platforms still dispatch the backend read.
9ea185d to
6f7104e
Compare
…ration textInputAt used app.descendants(.any).allElementsBoundByIndex (snapshots EVERY element) to find the text input at a point. fill drove this repeatedly: once it has coordinates, resolveTextEntryElement re-runs textInputAt on every verify/repair poll iteration whenever the focused-field reference goes stale (e.g. the Settings search bar repositioning bottom->top), so the full-tree enum dominated fill latency. Query the text-input element types directly (app.textFields/secureTextFields/ searchFields/textViews) instead. Same matches, but XCUITest resolves typed queries without snapshotting the whole tree. Measured (iPhone 17 sim, warm runner): fill 25 chars ~14.5s -> ~4.5s (3.2x), 6/6 exact. Same primitive #632 killed for get text.
|
…ve) + fix fill mis-navigation (#633) * perf(ios): early-exit text-entry readiness when the keyboard is visible The XCUITest text-entry focus/readiness loops keyed their fast-exit on focusedTextInput(), which is intentionally hardcoded to return nil on iOS (focus predicates are stale there). As a result stabilizeTextInputBeforeTyping always burned its full focusTimeout (0.4s) and waitForTextEntryReadiness burned its full readinessTimeout (2.0s) in the normal case where the software keyboard appears — ~2.4s of dead wait before a single keystroke on every type/fill. The software keyboard becoming visible is the reliable iOS readiness signal, so both loops now return as soon as isKeyboardVisible() is true. The warmup-first-char echo check and post-type verify/repair remain as drop safety nets. Measured on iPhone 17 sim (Settings search field), median type time: 25 chars: 3342ms -> 1379ms (2.4x) 52 chars: 3969ms -> 2190ms 313 chars: 10.3s -> 8.6s (remainder is genuine per-char XCUITest typing) Reliability unchanged: 64/65 trials exact (incl. a 50-word lorem ipsum, verified by read-back + screenshot); the lone miss triggered the existing verify/repair. * fix(ios): don't clear an already-empty text field (fixes fill mis-navigation) clearTextInput unconditionally ran moveCaretToEnd (an edge-tap computed from the element frame) + a 24-key delete burst, even when the field was empty. On a field that repositions on focus — e.g. the Settings search bar jumping bottom->top and revealing a 'Suggestions' list — that edge-tap used a stale frame and landed on an adjacent row (Developer), navigating away instead of clearing. fill (replace) into the search field went to the Developer pane (0/3 correct). Skip the clear entirely when the field's value is already empty (placeholder treated as empty): replacing into an empty field is a no-op, and skipping avoids the stray edge-tap. fill into the Settings search now types correctly and stays put: 5/5 exact (read-back + screenshot). * perf(ios): resolve text fields via typed queries, not full-tree enumeration textInputAt used app.descendants(.any).allElementsBoundByIndex (snapshots EVERY element) to find the text input at a point. fill drove this repeatedly: once it has coordinates, resolveTextEntryElement re-runs textInputAt on every verify/repair poll iteration whenever the focused-field reference goes stale (e.g. the Settings search bar repositioning bottom->top), so the full-tree enum dominated fill latency. Query the text-input element types directly (app.textFields/secureTextFields/ searchFields/textViews) instead. Same matches, but XCUITest resolves typed queries without snapshotting the whole tree. Measured (iPhone 17 sim, warm runner): fill 25 chars ~14.5s -> ~4.5s (3.2x), 6/6 exact. Same primitive #632 killed for get text. * fix(ios): address review — focus-change gate + secure-field clear Review P1 (focus race): the isKeyboardVisible early-exit in stabilizeTextInputBeforeTyping and waitForTextEntryReadiness fired the instant the keyboard was visible — but when it was ALREADY up from a previous field (back-to-back fills), that is before first-responder moves to the newly-tapped field, so app.typeText could target the old field. Gate the fast-path on a keyboard hidden->visible TRANSITION via a shared keyboardBecameVisible(wasVisibleAtEntry:) helper; when the keyboard was already up, fall back to the settle/timeout (the prior, correct behavior) instead of the ~2.4s dead wait the fresh case avoids. Review P1 (F2): clearTextInput used editableTextValue(...) ?? "" and skipped clearing on empty — but editableTextValue returns nil for secure (and unknown) fields, so secure fields were NEVER cleared and replace concatenated stale+new. Distinguish nil (clear) from "" (skip). Device-validated: fresh fill fast-path preserved + exact; a second fill with the keyboard already up still types into the correct field and replaces (not concatenates).
What
Speeds up
get texton iOS by ~80× for the common case (labeled controls, static text, cells, etc.). Found via the new perf benchmark (#630), where iOSget textwas the standout outlier — ~20× slower thanfind/isagainst the same element.Root cause
get textresolves the target from a snapshot (the same onefind/isuse), thenreadTextForNodedispatches a coordinatereadto the XCUITest runner. On the Swift side,readTextAt()does:allElementsBoundByIndexover.anyforces a full per-element snapshot of the entire tree — the slowest XCUITest path.find/isavoid it:finduses a predicate.firstMatch, andisevaluates against the already-captured snapshot. Soget textpaid for a second, full-tree round-trip the others never made.That coordinate re-read only recovers fuller text for editable/expandable inputs (
textField/secureTextField/searchField/textView/editText/textArea), where the live on-screen value can exceed the captured snapshot. For every other element type the freshly-captured snapshot node text is authoritative.Fix
In
readTextForNode, return the snapshot node's readable text directly for non-editable nodes with non-empty text, skipping the backend round-trip. Editable inputs keep the re-read. Mirrors the runner's ownprefersExpandedTextReadset, so behavior for editable fields is unchanged.Measured (iPhone 17 simulator, iOS 26.2)
get texton a labeled control (get text 'label="General" enabled'), steady-state:(Numbers are inflated by a co-resident simulator under load; the relative ~80× drop is the signal. Controlled stash/rebuild before/after on the same machine.)
Scope / notes
interaction-read.test.tsasserts non-editable nodes return snapshot text with no backend dispatch, while editable nodes and text-less nodes still dispatch.get text <selector>on an ambiguous selector (every iOS Settings cell is a label/button/text triplet) can take ~13s to returnAMBIGUOUS_MATCHeven thoughfinddisambiguates the same selector in ~1s. Worth a follow-up —get textsetsdisambiguateAmbiguousbut the runner-side query still rejects first.allElementsBoundByIndexinreadTextAt) would also help the editable case and other coordinate reads; deferred to keep this change small and rebuild-free.Stacking
Branched off
perf-harness(#630) so the perf benchmark could verify it; PR base isperf-harness. Rebase ontomainonce #630 merges.