From 02b726851425864253be72f92e6ec88303366ef3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Pierzcha=C5=82a?= <thymikee@gmail.com>
Date: Sat, 30 May 2026 13:19:48 +0200
Subject: [PATCH] docs: map Maestro compatibility debt

---
 docs/maestro-compat-debt-map.md | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 docs/maestro-compat-debt-map.md

diff --git a/docs/maestro-compat-debt-map.md b/docs/maestro-compat-debt-map.md
new file mode 100644
index 000000000..097bf142e
--- /dev/null
+++ b/docs/maestro-compat-debt-map.md
@@ -0,0 +1,22 @@
+# Maestro Compatibility Debt Map
+
+This map summarizes the Maestro compatibility surface after the lane 6 audit. LOC values are approximate and come from `wc -l` on the owning files; rows that share a file use an estimated slice. The current compat implementation is about 4.2k LOC under `src/compat/maestro`, with about 1.8k LOC of focused compat unit tests plus daemon/provider integration guards.
+
+| Area | Owning files | Approx LOC/code size | Custom handling summary | Native overlap and dependencies | Reliability or faster convergence opportunity | Risk | Test guard status | Recommendation | Suggested PR lane |
+| --- | --- | ---: | --- | --- | --- | --- | --- | --- | --- |
+| Parser and command mapping | `src/compat/maestro/replay-flow.ts`, `command-mapper.ts`, `support.ts`, `types.ts`, plus parse-side slices of `interactions.ts`, `device-actions.ts`, `flow-control.ts`, `run-script.ts` | ~1.5k LOC | Parses multi-document YAML, splits config/commands, resolves env, tracks rough top-level line numbers, flattens hooks, maps commands to `SessionAction`, rejects unsupported commands/fields loudly, and emits `__maestro*` runtime commands for semantics that native replay does not model. | Depends on `yaml`, replay `SessionAction`, `AppError`, and native commands such as `open`, `click`, `fill`, `type`, `wait`, `is`, `scroll`, `swipe`, `keyboard`, and `screenshot`. | Common replay parsing helpers could own env precedence, source-path handling, and line diagnostics. Native command builders could reduce stringly `SessionAction` construction for commands that already match `.ad`. | Line mapping is intentionally approximate for nested lists; parse-time expansion can hide runtime structure; unsupported syntax must keep failing loudly. | Strong: `replay-flow.test.ts` covers supported subset, env, hooks, runFlow, repeat, retry, launch reset, unsupported fields, fixture parsing; `replay-input.test.ts` covers backend routing/env precedence. | Keep Maestro YAML grammar local. Converge shared replay env/source/line utilities and typed action construction where native commands already exist. | Lane 7: parser contract hardening |
+| Runtime target matching | `src/compat/maestro/runtime-targets.ts` | 926 LOC | Implements Maestro-specific selector resolution over raw snapshots: id/label/text/value equality-or-regex matching, visible text fuzzy fallback, `childOf`, `index`, visible-only filtering, React Native overlay blocking, foreground duplicate preference, Android rectless hidden navigation handling, tab-strip slot inference, localized breadcrumb selection, and tap-target ancestor promotion. | Uses native selector parsing/matching (`parseSelectorChain`, `matchesSelector`), native visible predicates, snapshot text normalization, and React Native overlay detection. | Extract a reusable snapshot target resolver with policy hooks for ranking, visibility, overlay filtering, and promotion. Native `click`/`wait` could then opt into the generic parts without inheriting Maestro ranking. | Highest debt concentration. Heuristics encode real platform quirks and can regress seemingly unrelated flows, especially Android duplicate/hidden nodes and broad containers. | Strong: `runtime-targets.test.ts` is large and focused; PR #620 adds provider-level Android fresh-snapshot guard through Maestro replay. | Split generic snapshot traversal/filtering from Maestro ranking policy. Keep fuzzy/regex/read-order compatibility local. | Lane 7: target resolver extraction |
+| Input and focus | `replay-flow.ts` input coalescing, `interactions.ts` input/erase/paste/pressKey slices, `runtime.ts`, daemon Maestro fallback in `interaction-touch.ts` | ~350 LOC plus shared daemon flags | Coalesces `tapOn` + `inputText` + `pressKey Enter` into `wait`/`fill`/enter for likely text-entry selectors, leaves focused-field input as `type`, maps `eraseText` to backspaces, falls back from keyboard enter to newline typing, and allows Maestro non-hittable coordinate fallback for tap selectors. | Uses native `fill`, `type`, `keyboard`, `click`, replay variable substitution, and daemon touch fallback flags. | A native "focus then fill" replay primitive and focused-field clear/erase operation would remove most parser heuristics while improving `.ad` flows too. | Text-entry detection is name-based; focused-field commands depend on existing device focus; non-hittable coordinate fallback is intentionally compatibility-only and can mask poor selectors. | Moderate: parser coalescing tests in `replay-flow.test.ts`; daemon fallback covered in `interaction.test.ts`; no broad provider guard for focused erase/paste. | Converge on native focus/fill/clear semantics. Keep Maestro's optional/non-hittable quirks as compat flags. | Lane 8: native input primitive |
+| Assertions and waits | `runtime-assertions.ts`, wait/assert slices of `interactions.ts`, `runtime-flow.ts` visible condition path | ~450 LOC | Polls raw snapshots for `assertVisible`, adds a terminal grace capture, treats `assertNotVisible` as stable hidden after repeated samples or timeout, computes animation stability from snapshot signatures, maps `extendedWaitUntil`, and waits briefly for `runFlow.when.visible` while keeping `notVisible` point-in-time. | Uses native `snapshot`, `wait`, `is`, visible predicates, reference frames, replay blocks, and timeout helpers. | A shared waiter/poller utility with explicit grace, stable-hidden, raw-snapshot, and timeout policies would let native `wait` and Maestro share mechanics without sharing defaults. | Timeout and stability semantics are subtle; raw full snapshots are expensive; making `notVisible` wait would change cleanup-flow behavior. | Good: `runtime-assertions.test.ts` covers deadline/hidden edge cases; `runtime-flow.test.ts` covers visible wait and immediate notVisible. | Extract generic polling/stability helpers; keep Maestro timing constants and condition semantics local. | Lane 8: waiter convergence |
+| Scroll, swipe, and geometry | `points.ts`, `runtime-geometry.ts`, geometry/scroll/swipe slices of `interactions.ts` and `runtime-interactions.ts` | ~500 LOC | Parses absolute and percentage points, converts percentage taps/swipes through raw snapshot reference frames, caches reference frame in replay scope, biases tap points for large text containers and bottom tabs, loops `scrollUntilVisible` with `wait`/`find` probes, maps `swipe.label`, and uses an Android horizontal content-lane adjustment. | Uses native `click`, `scroll`, `swipe`, `find`, `wait`, touch reference frames, and raw snapshots. | A native gesture planner for percent coordinates, target-derived swipes, and frame caching would speed up convergence. Platform-specific lane policies can remain pluggable. | Geometry heuristics are device/frame sensitive; stale or missing raw snapshot frames break percent gestures; Android horizontal swipes depend on app layout assumptions. | Good: `runtime-geometry.test.ts`, `runtime-interactions.test.ts`; PR #620 provider guard asserts fresh snapshots and Android content-lane swipe coordinates. | Extract generic percent/target geometry and frame caching. Keep Maestro Android content-lane and tap bias policies local until broader demand exists. | Lane 8 or 9: gesture planner |
+| Flow control, `runFlow`, retry, and hooks | `flow-control.ts`, `runtime-flow.ts`, hook flattening in `replay-flow.ts` | ~700 LOC | Handles `onFlowStart`/`onFlowComplete`, file and inline `runFlow`, per-block env, static platform gates, limited `when.true` boolean/platform expressions, runtime visible/notVisible gates via batch steps, parse-expanded `repeat.times`, and runtime `retry` via replay retry blocks. | Depends on replay block runtime (`invokeReplayActionBlock`, `invokeReplayRetryBlock`), replay vars/env, batch step projection, and native snapshot visibility resolution. | A native replay block AST for conditional blocks, deterministic repeats, and retry would remove parse-time flattening and make line/step reporting more faithful. | `repeat.times` materializes actions with a guardrail; nested line numbers are lossy; expression support is intentionally tiny; visible conditions are snapshot-dependent. | Strong: `replay-flow.test.ts` covers hooks/runFlow/env/repeat/retry/platform gates/expressions; `runtime-flow.test.ts` covers runtime condition behavior; PR #620 covers provider retry/runFlow path. | Converge on native replay control-flow primitives, but keep Maestro expression grammar and unsupported `repeat.while` local until native runtime has loop semantics. | Lane 7 or 9: replay block AST |
+| `runScript` | `run-script.ts`, `runtime.ts` runScript branch | 229 LOC plus router branch | Executes trusted flow-local JavaScript with `node:vm`, exposes env values, `output`, `json`, and synchronous-looking `http.post` through a timeout-bounded child Node process; exports output as `output.<key>` replay variables. | Uses replay variable scope, `runCmdSync`, and `AppError`; there is no native `.ad` command equivalent. | Shared env/output variable plumbing could converge, but script execution should not become native without a separate security model and product decision. | High security and determinism risk: `node:vm` is not a sandbox, scripts can make network requests, async support is intentionally narrow, and output key rules are compatibility-specific. | Partial: parser/order/env/path behavior covered in `replay-flow.test.ts`; docs describe trust/security limits; no focused execution tests for `http.post`, `json`, or output validation. | Keep compat-local. Add focused execution tests before expanding helpers; do not expose as native command until sandboxing and trust model are explicit. | Lane 9: runScript guard tests |
+| Suite discovery and test artifacts | `src/daemon/handlers/session-test-discovery.ts`, `session-test-suite.ts`, `session-test-artifacts.ts`, replay grammar/backend plumbing | ~120 Maestro-specific LOC in daemon/test path | When `--maestro`/backend is selected, discovers `.ad`, `.yaml`, and `.yml`, allows untyped YAML through platform filtering, runs through native replay test suite, and preserves original Maestro flow filenames in artifacts. | Native test suite runner, replay backend selection, session lifecycle, artifact materialization, and CLI `--maestro` flag. | Mostly converged already. The remaining improvement is making replay backend extension/filter policy data-driven so future backends avoid daemon conditionals. | Low. Main risk is accidentally including non-Maestro YAML when backend is set, or changing platform-filter behavior for untyped YAML. | Good: `session-test-discovery.test.ts`, `session-test-suite.test.ts`, `session-test-artifacts.test.ts`; PR #620 provider suite runs a Maestro YAML test. | Keep small daemon hook for now; extract backend discovery policy only if another replay backend appears. | Lane 6 complete / no follow-up unless backend count grows |
+| Docs and support matrix | `support-matrix.ts`, `website/docs/docs/replay-e2e.md`, CLI flag/help tests, issue tracker references | 42 LOC source matrix plus docs | Maintains supported/unsupported capability lists, formats CLI help copy, links tracker/new issue URLs, and keeps replay docs synced with the source matrix. | CLI flag definitions/help rendering and website docs. | Generate or embed the support list from one source in docs/help to remove manual prose drift. | Medium user-facing risk: stale docs can imply unsupported parity or hide known gaps. | Good: `support-matrix.test.ts` asserts CLI help and docs stay synced with the shared matrix. | Keep `support-matrix.ts` as source of truth; update it with every behavior change and mirror only explanatory context in docs. | Lane 6 complete / ongoing docs hygiene |
+
+## Convergence Priority
+
+1. Target matching is the largest and riskiest debt. Extract reusable snapshot traversal/filtering first, then keep Maestro ranking rules as a policy.
+2. Replay control-flow should converge before adding more Maestro flow syntax. A native block AST would make `runFlow`, `retry`, and future loop support less brittle.
+3. Input/focus and wait semantics are good candidates for native primitives because they improve `.ad` replay as well as Maestro compatibility.
+4. `runScript` should remain compatibility-local unless a separate secure native script story is designed.