Skip to content

[aw-failures] [aw] Daily BYOK Ollama Test fails with 0 turns — api-proxy /v1/models returns HTTP 503 during model discovery #40417

@github-actions

Description

@github-actions

Parent: #39883 (6h failure investigation report).

Problem statement

Daily BYOK Ollama Test fails at the agent job (Execute GitHub Copilot CLI) with 0 turns — the agent never starts a turn. The Copilot harness aborts during model discovery (awf-reflect) because the BYOK API proxy returns HTTP 503 on its /v1/models endpoint, exhausting all retries.

Affected workflow and run IDs

  • Workflow: Daily BYOK Ollama Test (.github/workflows/daily-byok-ollama-test.lock.yml), engine copilot (GitHub Copilot CLI 1.0.63), model claude-sonnet-4.6, trigger schedule.
  • Failed run: §27851504668 — 2026-06-19 22:36 UTC, main. turns=0, agentic_fraction=0, duration 4.3m, exit code 1.
  • Baseline: audit reports baseline_found=false (no successful cohort run for comparison).

Evidence (agent-stdio.log)

[warning] awf-reflect models fetch for (apiproxy/redacted) failed (attempt 1/5): models fetch returned 503
... retries 2/5 .. 5/5 ...
[warning] awf-reflect models fetch ... failed after 4 retry attempts: models fetch returned 503 for (apiproxy/redacted)
[copilot-harness] awf-reflect: models fetch returned 503 for (apiproxy/redacted)
[copilot-harness] done: exitCode=1 totalDuration=52s

Firewall audit: 22 requests, 0 blocked — the network reached the proxy; the proxy itself answered 503, so this is not a firewall/policy denial.

Probable root cause

The BYOK backend behind awf-api-proxy (api-proxy:10002) did not serve /v1/models (Ollama model not pulled / server not yet ready / upstream 503). Model discovery is treated as fatal, so the harness exits before any agent turn. The retry budget (5 attempts over ~3.75s) is too short to ride out a cold Ollama start.

Proposed remediation

  1. Add a readiness gate on /v1/models (expect 200) before the agent job starts, and pre-pull/warm the target Ollama model.
  2. Increase the awf-reflect model-fetch retry budget/backoff on the BYOK path so a transient/cold-start 503 does not hard-fail the run.
  3. Emit a classified error (e.g. isBYOKModelDiscoveryError) instead of a bare exitCode=1, so this is distinguishable from agent-side failures in audit.

Success criteria / verification

Generated by 🔍 [aw] Failure Investigator (6h) · 172.3 AIC · ⌖ 11.7 AIC · ⊞ 4.9K ·

  • expires on Jun 26, 2026, 5:40 PM UTC-08:00

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions