Explore TCP-like command lifecycle protocol for iOS runner

## What to explore

Investigate a more TCP-like command protocol between the daemon and the iOS XCTest runner so mutating command failures can be classified more precisely than today's preflight/send split.

Today the daemon uses an `uptime` preflight before most mutating commands to distinguish "runner was dead before command send" from "transport failed after command send". That is reliability-first, but it adds a round trip and still leaves uncertainty when the actual command send fails: the runner may not have received the command, may have partially executed it, may have completed it but lost the response, or may have crashed mid-command.

Explore whether command ids and runner-side command lifecycle acknowledgements can improve this without weakening the no-replay guarantee for mutating commands.

## Acceptance criteria

- [ ] Document the current failure modes around mutating iOS runner commands, including why post-send failures are not retried today.
- [ ] Prototype or design a command lifecycle model with at least `accepted`, `started`, `completed`, and `failed/unknown` states.
- [ ] Decide whether lifecycle state should be exposed through `uptime`, a dedicated `status(commandId)` command, or another runner endpoint.
- [ ] Define daemon retry behavior for `not accepted`, `accepted`, `started`, `completed`, and unknown/crashed states.
- [ ] Evaluate whether the model can safely remove or reduce eager `uptime` preflights for non-tap mutating commands.
- [ ] Include perf expectations: likely saved round trips vs XCTest-dominated command costs.
- [ ] Include reliability risks and limits, especially runner crashes after UI mutation but before status is recorded.

## Notes

The likely direction is:

- daemon assigns a stable `commandId` to each runner command
- runner keeps a small in-memory journal of the current and recent command ids
- daemon can reconnect and query command status before deciding whether a retry is safe
- retry is allowed only when the runner can prove the command was not accepted
- started/completed/unknown mutating commands remain non-retryable by default

This should be treated as protocol exploration, not a quick perf optimization. Reliability should continue to trump latency.

## Blocked by

None - can start immediately.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore TCP-like command lifecycle protocol for iOS runner #656

What to explore

Acceptance criteria

Notes

Blocked by

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Explore TCP-like command lifecycle protocol for iOS runner #656

Description

What to explore

Acceptance criteria

Notes

Blocked by

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions