Skip to content

perf(ios): make get text ~80x faster for non-editable elements#632

Merged
thymikee merged 2 commits into
mainfrom
ios-faster-get-text
May 31, 2026
Merged

perf(ios): make get text ~80x faster for non-editable elements#632
thymikee merged 2 commits into
mainfrom
ios-faster-get-text

Conversation

@thymikee
Copy link
Copy Markdown
Member

What

Speeds up get text on iOS by ~80× for the common case (labeled controls, static text, cells, etc.). Found via the new perf benchmark (#630), where iOS get text was the standout outlier — ~20× slower than find/is against the same element.

Root cause

get text resolves the target from a snapshot (the same one find/is use), then readTextForNode dispatches a coordinate read to the XCUITest runner. On the Swift side, readTextAt() does:

let candidates = app.descendants(matching: .any).allElementsBoundByIndex
  .filter { $0.exists && $0.frame.contains(point) }
  .sorted {  }   // by area, then position, then type

allElementsBoundByIndex over .any forces a full per-element snapshot of the entire tree — the slowest XCUITest path. find/is avoid it: find uses a predicate .firstMatch, and is evaluates against the already-captured snapshot. So get text paid for a second, full-tree round-trip the others never made.

That coordinate re-read only recovers fuller text for editable/expandable inputs (textField/secureTextField/searchField/textView/editText/textArea), where the live on-screen value can exceed the captured snapshot. For every other element type the freshly-captured snapshot node text is authoritative.

Fix

In readTextForNode, return the snapshot node's readable text directly for non-editable nodes with non-empty text, skipping the backend round-trip. Editable inputs keep the re-read. Mirrors the runner's own prefersExpandedTextRead set, so behavior for editable fields is unchanged.

if (fallbackText && !prefersValueForReadableText(node.type ?? '')) {
  return fallbackText; // snapshot node text is authoritative; skip the ~20x slower re-read
}

Measured (iPhone 17 simulator, iOS 26.2)

get text on a labeled control (get text 'label="General" enabled'), steady-state:

before after
get text (non-editable) ~25s (read path; up to 91s under load) ~0.3s

(Numbers are inflated by a co-resident simulator under load; the relative ~80× drop is the signal. Controlled stash/rebuild before/after on the same machine.)

Scope / notes

  • Editable text inputs are unchanged (still re-read; live value can differ from the snapshot).
  • Unit test added: interaction-read.test.ts asserts non-editable nodes return snapshot text with no backend dispatch, while editable nodes and text-less nodes still dispatch.
  • Separate iOS issue, not addressed here: get text <selector> on an ambiguous selector (every iOS Settings cell is a label/button/text triplet) can take ~13s to return AMBIGUOUS_MATCH even though find disambiguates the same selector in ~1s. Worth a follow-up — get text sets disambiguateAmbiguous but the runner-side query still rejects first.
  • A deeper Swift-side fix (avoid allElementsBoundByIndex in readTextAt) would also help the editable case and other coordinate reads; deferred to keep this change small and rebuild-free.

Stacking

Branched off perf-harness (#630) so the perf benchmark could verify it; PR base is perf-harness. Rebase onto main once #630 merges.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 393a490ddc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/daemon/handlers/interaction-read.ts Outdated
Comment on lines +32 to +33
if (fallbackText && !prefersValueForReadableText(node.type ?? '')) {
return fallbackText;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve macOS helper value-first reads

For macOS desktop/frontmost-app reads this early return changes the observable text when an accessibility element has both a title/description and an AXValue: the snapshot stores label as title ?? description ?? value while retaining value, but the macOS helper read path returns AXValue before title/description. In that environment a non-editable control such as an AX checkbox/slider with a label and current value will now return the label from the snapshot instead of the value that get text returned before, because the helper dispatch is skipped for all platforms rather than only the iOS XCUITest path this optimization targets.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the fast-path early-return is gated to device.platform === 'ios' on the current branch tip, so macOS helper and Linux keep their value-first backend reads (and Android too). Added test cases asserting the backend read IS still dispatched on android/macos/linux. (The reviewed commit predated the gate.)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 31, 2026

Size Report

Metric Base Current Diff
JS raw 1.1 MB 1.1 MB +138 B
JS gzip 355.9 kB 356.0 kB +64 B
npm tarball 452.9 kB 453.0 kB +63 B
npm unpacked 1.5 MB 1.5 MB +138 B

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 26.9 ms 27.0 ms +0.1 ms
CLI --help 42.3 ms 42.8 ms +0.5 ms

Top changed chunks:

Chunk Raw diff Gzip diff
dist/src/selector-runtime.js +76 B +34 B
dist/src/940.js +62 B +30 B

@thymikee thymikee force-pushed the ios-faster-get-text branch from 70dac1f to ecfe239 Compare May 31, 2026 11:33
thymikee added a commit that referenced this pull request May 31, 2026
…ration

textInputAt used app.descendants(.any).allElementsBoundByIndex (snapshots EVERY
element) to find the text input at a point. fill drove this repeatedly: once it has
coordinates, resolveTextEntryElement re-runs textInputAt on every verify/repair poll
iteration whenever the focused-field reference goes stale (e.g. the Settings search
bar repositioning bottom->top), so the full-tree enum dominated fill latency.

Query the text-input element types directly (app.textFields/secureTextFields/
searchFields/textViews) instead. Same matches, but XCUITest resolves typed queries
without snapshotting the whole tree. Measured (iPhone 17 sim, warm runner): fill 25
chars ~14.5s -> ~4.5s (3.2x), 6/6 exact. Same primitive #632 killed for get text.
@thymikee thymikee force-pushed the ios-faster-get-text branch 2 times, most recently from d066130 to 9ea185d Compare May 31, 2026 12:22
Base automatically changed from perf-harness to main May 31, 2026 12:38
thymikee added 2 commits May 31, 2026 12:39
readTextForNode dispatched a coordinate 'read' to the iOS XCUITest runner for
every get text, where readTextAt() enumerates the full element tree
(allElementsBoundByIndex) — ~20x slower than the snapshot already captured to
resolve the node. That re-read only recovers fuller text for editable/expandable
inputs (textField/searchField/textView/…); for all other element types the
freshly-captured snapshot node text is authoritative.

Return the snapshot node text directly for non-editable nodes with non-empty
readable text, skipping the round-trip. Measured on iPhone 17 sim: get text on a
labeled control drops from ~25s to ~0.3s steady-state. Editable inputs keep the
backend re-read (live value can exceed the snapshot).
Review (P1): the fast-path skipped the backend read on Android/Linux/macOS too,
but those backends read value-first (macOS helper: AXValue→title→description;
Linux similar) whereas snapshot readable text is label-first for non-editables —
so skipping their read changed get text output. Restrict the optimization to the
iOS XCUITest path (the slow allElementsBoundByIndex re-read it targets). Adds a
test asserting non-iOS platforms still dispatch the backend read.
@thymikee thymikee force-pushed the ios-faster-get-text branch from 9ea185d to 6f7104e Compare May 31, 2026 12:39
thymikee added a commit that referenced this pull request May 31, 2026
…ration

textInputAt used app.descendants(.any).allElementsBoundByIndex (snapshots EVERY
element) to find the text input at a point. fill drove this repeatedly: once it has
coordinates, resolveTextEntryElement re-runs textInputAt on every verify/repair poll
iteration whenever the focused-field reference goes stale (e.g. the Settings search
bar repositioning bottom->top), so the full-tree enum dominated fill latency.

Query the text-input element types directly (app.textFields/secureTextFields/
searchFields/textViews) instead. Same matches, but XCUITest resolves typed queries
without snapshotting the whole tree. Measured (iPhone 17 sim, warm runner): fill 25
chars ~14.5s -> ~4.5s (3.2x), 6/6 exact. Same primitive #632 killed for get text.
@thymikee thymikee merged commit a8bec05 into main May 31, 2026
18 checks passed
@thymikee thymikee deleted the ios-faster-get-text branch May 31, 2026 13:07
@github-actions
Copy link
Copy Markdown

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-31 13:07 UTC

thymikee added a commit that referenced this pull request May 31, 2026
…ve) + fix fill mis-navigation (#633)

* perf(ios): early-exit text-entry readiness when the keyboard is visible

The XCUITest text-entry focus/readiness loops keyed their fast-exit on
focusedTextInput(), which is intentionally hardcoded to return nil on iOS (focus
predicates are stale there). As a result stabilizeTextInputBeforeTyping always
burned its full focusTimeout (0.4s) and waitForTextEntryReadiness burned its full
readinessTimeout (2.0s) in the normal case where the software keyboard appears —
~2.4s of dead wait before a single keystroke on every type/fill.

The software keyboard becoming visible is the reliable iOS readiness signal, so
both loops now return as soon as isKeyboardVisible() is true. The warmup-first-char
echo check and post-type verify/repair remain as drop safety nets.

Measured on iPhone 17 sim (Settings search field), median type time:
  25 chars:  3342ms -> 1379ms  (2.4x)
  52 chars:  3969ms -> 2190ms
  313 chars: 10.3s   -> 8.6s    (remainder is genuine per-char XCUITest typing)
Reliability unchanged: 64/65 trials exact (incl. a 50-word lorem ipsum, verified
by read-back + screenshot); the lone miss triggered the existing verify/repair.

* fix(ios): don't clear an already-empty text field (fixes fill mis-navigation)

clearTextInput unconditionally ran moveCaretToEnd (an edge-tap computed from the
element frame) + a 24-key delete burst, even when the field was empty. On a field
that repositions on focus — e.g. the Settings search bar jumping bottom->top and
revealing a 'Suggestions' list — that edge-tap used a stale frame and landed on an
adjacent row (Developer), navigating away instead of clearing. fill (replace) into
the search field went to the Developer pane (0/3 correct).

Skip the clear entirely when the field's value is already empty (placeholder
treated as empty): replacing into an empty field is a no-op, and skipping avoids
the stray edge-tap. fill into the Settings search now types correctly and stays
put: 5/5 exact (read-back + screenshot).

* perf(ios): resolve text fields via typed queries, not full-tree enumeration

textInputAt used app.descendants(.any).allElementsBoundByIndex (snapshots EVERY
element) to find the text input at a point. fill drove this repeatedly: once it has
coordinates, resolveTextEntryElement re-runs textInputAt on every verify/repair poll
iteration whenever the focused-field reference goes stale (e.g. the Settings search
bar repositioning bottom->top), so the full-tree enum dominated fill latency.

Query the text-input element types directly (app.textFields/secureTextFields/
searchFields/textViews) instead. Same matches, but XCUITest resolves typed queries
without snapshotting the whole tree. Measured (iPhone 17 sim, warm runner): fill 25
chars ~14.5s -> ~4.5s (3.2x), 6/6 exact. Same primitive #632 killed for get text.

* fix(ios): address review — focus-change gate + secure-field clear

Review P1 (focus race): the isKeyboardVisible early-exit in stabilizeTextInputBeforeTyping
and waitForTextEntryReadiness fired the instant the keyboard was visible — but when it was
ALREADY up from a previous field (back-to-back fills), that is before first-responder moves
to the newly-tapped field, so app.typeText could target the old field. Gate the fast-path on
a keyboard hidden->visible TRANSITION via a shared keyboardBecameVisible(wasVisibleAtEntry:)
helper; when the keyboard was already up, fall back to the settle/timeout (the prior, correct
behavior) instead of the ~2.4s dead wait the fresh case avoids.

Review P1 (F2): clearTextInput used editableTextValue(...) ?? "" and skipped clearing on
empty — but editableTextValue returns nil for secure (and unknown) fields, so secure fields
were NEVER cleared and replace concatenated stale+new. Distinguish nil (clear) from "" (skip).

Device-validated: fresh fill fast-path preserved + exact; a second fill with the keyboard
already up still types into the correct field and replaces (not concatenates).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant