Skip to content

Skip launches against URLs a prior act proved unreachable#9

Open
wittjeff wants to merge 1 commit into
jrpool:mainfrom
wittjeff:skip-launches-on-known-bad-urls
Open

Skip launches against URLs a prior act proved unreachable#9
wittjeff wants to merge 1 commit into
jrpool:mainfrom
wittjeff:skip-launches-on-known-bad-urls

Conversation

@wittjeff
Copy link
Copy Markdown
Contributor

Fixes #8.

Problem

When a test act's launch + navigation fails, procs/launch.js retries up to 3 times with backoff. If all 3 retries fail, the act exits with data.prevented: true, data.error: 'No page'. The job is not aborted, so the next test act in the same job forks a fresh child process and runs the same 3-retry loop against the same URL.

For an 8-engine job against an unreachable target, that's up to 24 launch attempts when 3 would have been enough to establish unreachability. The log of a real run looks like:

ERROR launching or navigating (...)
WARNING: Waiting 1 sec. before retrying (retries left: 2)
ERROR launching or navigating (...)
WARNING: Waiting 1 sec. before retrying (retries left: 1)
ERROR launching or navigating (...)
[next act spawns its own child; same retry triplet repeats]
[and again, and again, …]

See #8 for the full diagnosis.

Fix

Track failed URLs at the job level. Subsequent acts targeting the same URL skip the fork and report the prior failure.

Changes

  • run.js: initialize jobData.unreachableURLs = {} (object keyed by URL → error message).
  • procs/doActs.js:
    • After a test act ends with data.prevented and one of two canonical launch-failure error strings ('No page' set by doTestAct.js, or 'ERROR: No retries left…' set by launch.js's addError), record the act's effective target URL in unreachableURLs.
    • Before forking a child for a test act, check whether its effective target URL (act.launch?.target?.url || report.target.url — same precedence the child uses) is already in the set. If so, mark the act prevented: true inline with a descriptive error message that carries through the original failure reason, increment actCount, and continue — no child fork, no fresh retry loop.
  • README.md: document the new jobData.unreachableURLs field.

Tool-prevention vs launch-failure

The skip set is populated only when act.data.error is 'No page' or starts with 'ERROR: No retries left'. Tool-internal preventions (axe choking on malformed markup, qualWeb rejecting unsupported content, etc.) use different error strings, so they don't poison the set — a later engine that handles the page differently still gets its turn.

Per-URL granularity

Acts with an explicit act.launch.target.url override pointing somewhere reachable still launch normally. The skip set is keyed by the exact URL that failed; mixed-URL jobs are unaffected.

Backwards compatibility

  • New field jobData.unreachableURLs. Empty {} when no launches fail.
  • Job-input fields unchanged.
  • Per-act semantics (3 retries with backoff) unchanged for the first act to encounter a given URL.

Test notes

node --check clean on the three modified files. End-to-end verification against the same scenario that motivated #7 (WebKit launches failing because of the stealth-arg incompatibility): with #7 applied, WebKit launches succeed; with #8/this PR applied on top, an intentionally-unreachable target shows one act's worth of retries plus 7 fast-skip messages, instead of 8 × 3 launch attempts.

Happy to adjust the error-string heuristic if you'd prefer a more specific signal (e.g. a structured act.data.cause: 'launch' field) or a more conservative one.

Fixes jrpool#8.

When a test act's launch + navigation fails, procs/launch.js retries up
to 3 times with backoff. If all 3 retries fail, the act exits with
`data.prevented: true, data.error: 'No page'`. Today, the next test act
in the job forks a fresh child process that runs the same 3-retry loop
against the same URL — so an unreachable target costs (3 × N engines)
launch attempts when 3 would have been enough to establish unreachability.

Track failed URLs at the job level and short-circuit subsequent acts
that target them:

- `run.js`: initialize `jobData.unreachableURLs = {}` (object keyed by
  URL → error message).
- `procs/doActs.js`: after a test act ends with `data.prevented` and
  one of the two canonical launch-failure error strings (`'No page'` or
  `'ERROR: No retries left*'`), record the act's effective target URL
  in `unreachableURLs`. Before forking a child for the next test act,
  check whether its effective target URL is already in the set; if so,
  mark the act prevented inline (with a descriptive error message and
  the original failure reason carried through) and skip the fork.
- `README.md`: document the new `jobData.unreachableURLs` field.

Tool-internal preventions (e.g. axe choking on malformed markup) use
different error strings, so they don't poison the skip set. Acts with
an explicit `act.launch.target.url` override pointing somewhere
reachable still launch normally — per-URL granularity preserves
mixed-URL jobs.

Backwards-compatible: jobs that never hit a launch failure see `jobData.unreachableURLs: {}` and identical behavior to before.
@jrpool jrpool self-requested a review May 24, 2026 21:51
Copy link
Copy Markdown
Owner

@jrpool jrpool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged several commits ending with 06ea31d and attached a message to that commit documenting you as co-author. Those commits are intended to implement the intent of this PR by adding an environment variable that can increase the assertiveness of the decision whether to abort a job. When that variable is set to 'true', jobs are aborted in some cases where they otherwise would not be. When a job is aborted, the records of skipped acts are not revised, but the reason for the abort is recorded, so a consumer of a report could interpret the reason as to whether the target page is responsible and if so treat the page as having prevented the skipped acts. The new env.example file documents the environment variable, and one can search for its name to find where it decides whether to abort.

Does this revision achieve the purpose of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test acts retry browser launch independently even after first act exhausted retries against the same URL

2 participants