[aw-failures] [aw] Daily BYOK Ollama Test fails with 0 turns — api-proxy /v1/models returns HTTP 503 during model discovery

Parent: #39883 (6h failure investigation report).

### Problem statement

`Daily BYOK Ollama Test` fails at the `agent` job (`Execute GitHub Copilot CLI`) with **0 turns** — the agent never starts a turn. The Copilot harness aborts during model discovery (`awf-reflect`) because the BYOK API proxy returns HTTP `503` on its `/v1/models` endpoint, exhausting all retries.

### Affected workflow and run IDs

- **Workflow:** `Daily BYOK Ollama Test` (`.github/workflows/daily-byok-ollama-test.lock.yml`), engine `copilot` (GitHub Copilot CLI `1.0.63`), model `claude-sonnet-4.6`, trigger `schedule`.
- **Failed run:** [§27851504668](https://github.com/github/gh-aw/actions/runs/27851504668) — 2026-06-19 22:36 UTC, `main`. `turns=0`, `agentic_fraction=0`, duration 4.3m, exit code 1.
- **Baseline:** audit reports `baseline_found=false` (no successful cohort run for comparison).

### Evidence (agent-stdio.log)

```
[warning] awf-reflect models fetch for (apiproxy/redacted) failed (attempt 1/5): models fetch returned 503
... retries 2/5 .. 5/5 ...
[warning] awf-reflect models fetch ... failed after 4 retry attempts: models fetch returned 503 for (apiproxy/redacted)
[copilot-harness] awf-reflect: models fetch returned 503 for (apiproxy/redacted)
[copilot-harness] done: exitCode=1 totalDuration=52s
```

Firewall audit: 22 requests, 0 blocked — the network reached the proxy; the proxy itself answered 503, so this is not a firewall/policy denial.

### Probable root cause

The BYOK backend behind `awf-api-proxy` (`api-proxy:10002`) did not serve `/v1/models` (Ollama model not pulled / server not yet ready / upstream 503). Model discovery is treated as fatal, so the harness exits before any agent turn. The retry budget (5 attempts over ~3.75s) is too short to ride out a cold Ollama start.

### Proposed remediation

1. Add a readiness gate on `/v1/models` (expect 200) before the agent job starts, and pre-pull/warm the target Ollama model.
2. Increase the `awf-reflect` model-fetch retry budget/backoff on the BYOK path so a transient/cold-start 503 does not hard-fail the run.
3. Emit a classified error (e.g. `isBYOKModelDiscoveryError`) instead of a bare `exitCode=1`, so this is distinguishable from agent-side failures in audit.

### Success criteria / verification

- Daily BYOK Ollama Test reaches at least 1 agent turn on scheduled runs, or fails with a classified BYOK model-discovery error rather than `turns=0` exit 1.
- A readiness check confirms `/v1/models` returns 200 before the agent starts.
- Add a regression test covering the cold-start 503 model-fetch path.
Related to #39883







> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/27856105592) · 172.3 AIC · ⌖ 11.7 AIC · ⊞ 4.9K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)
> - [x] expires  on Jun 26, 2026, 5:40 PM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aw-failures] [aw] Daily BYOK Ollama Test fails with 0 turns — api-proxy /v1/models returns HTTP 503 during model discovery #40417

Problem statement

Affected workflow and run IDs

Evidence (agent-stdio.log)

Probable root cause

Proposed remediation

Success criteria / verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[aw-failures] [aw] Daily BYOK Ollama Test fails with 0 turns — api-proxy /v1/models returns HTTP 503 during model discovery #40417

Description

Problem statement

Affected workflow and run IDs

Evidence (agent-stdio.log)

Probable root cause

Proposed remediation

Success criteria / verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions