Skip to content

fix(cli-proxy): replace gh api rate_limit liveness probe with TCP connectivity check#4677

Closed
Copilot wants to merge 2 commits into
mainfrom
copilot/resolve-awf-cli-proxy-exit-issue
Closed

fix(cli-proxy): replace gh api rate_limit liveness probe with TCP connectivity check#4677
Copilot wants to merge 2 commits into
mainfrom
copilot/resolve-awf-cli-proxy-exit-issue

Conversation

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

awf-cli-proxy was exiting(1) at startup due to a fragile liveness probe introduced in #4486 and a race condition worsened by the IPv6 tunnel change in #4626, blocking all copilot engine workflows.

Root causes

  • Race condition: gh api rate_limit ran immediately after node tcp-tunnel.js &, before the tunnel had bound localhost:${DIFC_PORT}, causing instant connection refused
  • Fragile probe: gh api rate_limit requires a valid GH_TOKEN, depends on gh CLI routing API paths correctly for a custom GH_HOST (changed between versions), and only retried twice (2 × 6s window)

Changes

  • TCP tunnel readiness wait: polls 127.0.0.1:${DIFC_PORT} (up to 10 × 0.5s) before the liveness probe, eliminating the race condition
  • Probe replaced with raw TCP check: bash /dev/tcp/${DIFC_HOST}/${DIFC_PORT} — same pattern as healthcheck.sh; no token, no gh CLI, no API path dependency
  • More resilient defaults: MAX_LIVENESS_ATTEMPTS 2→5, sleep 1s→2s
  • Clearer error messages: both tunnel-bind and proxy-unreachable failures identify the failing component explicitly
# Before — fails on token absence, gh CLI version changes, or tunnel not yet bound
gh api rate_limit  # with GH_HOST=localhost:18443, 2 attempts

# After — checks only that the DIFC proxy port is open
timeout 5 bash -c "cat < /dev/null > /dev/tcp/${DIFC_HOST}/${DIFC_PORT}"  # 5 attempts

Replace gh api rate_limit with a raw TCP check (consistent with
healthcheck.sh). The old probe failed if: the TCP tunnel was not
yet bound when it ran (race condition), GH_TOKEN was absent, or the
gh CLI version changed how it routes API paths for a custom GH_HOST.

Changes:
- Add a tunnel-readiness loop that polls 127.0.0.1:DIFC_PORT before
  running the liveness probe, eliminating the startup race condition
- Replace gh api rate_limit with a bash /dev/tcp connectivity check
  directly to DIFC_HOST:DIFC_PORT - no token needed, version-agnostic
- Raise default MAX_LIVENESS_ATTEMPTS from 2 to 5 and sleep from
  1s to 2s for resilience against transient startup delays
- Improve error messages to identify the failing component clearly
Copilot AI changed the title [WIP] Fix awf-cli-proxy container exit issue affecting workflows fix(cli-proxy): replace gh api rate_limit liveness probe with TCP connectivity check Jun 10, 2026
Copilot finished work on behalf of lpcox June 10, 2026 13:52
Copilot AI requested a review from lpcox June 10, 2026 13:52
@lpcox lpcox marked this pull request as ready for review June 10, 2026 15:45
Copilot AI review requested due to automatic review settings June 10, 2026 15:45
@github-actions

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 96.43% 96.47% 📈 +0.04%
Statements 96.35% 96.39% 📈 +0.04%
Functions 98.76% 98.76% ➡️ +0.00%
Branches 90.72% 90.75% 📈 +0.03%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/config-writer.ts 89.3% → 90.9% (+1.65%) 89.3% → 90.9% (+1.65%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the awf-cli-proxy sidecar startup sequence by removing a brittle gh api rate_limit liveness probe and replacing it with TCP-based readiness/liveness checks, preventing early exit(1) failures that were blocking Copilot engine workflows.

Changes:

  • Adds a bounded wait loop to ensure the local TCP tunnel is listening on 127.0.0.1:${DIFC_PORT} before any external connectivity probing.
  • Replaces the external DIFC proxy liveness probe with a raw TCP connectivity check to ${DIFC_HOST}:${DIFC_PORT} (no GH_TOKEN/gh CLI dependency).
  • Increases default retry/backoff parameters and improves startup error messages for clearer diagnosis.
Show a summary per file
File Description
containers/cli-proxy/entrypoint.sh Adds tunnel readiness polling and switches DIFC liveness probing from gh api to TCP checks with more resilient retry defaults and clearer errors.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 0

@github-actions

Copy link
Copy Markdown
Contributor

fix(cli-proxy): replace gh api rate_limit liveness probe with TCP connectivity check
fix: propagate config fields to all layers
ci: remove dind-ubuntu image from release workflow
GitHub title check: ✅
File write: ✅
Build: ✅
Discussion: ✅
Overall status: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #4677 · 131.7 AIC · ⊞ 35.5K ·

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results: FAIL ❌

  • Redis PING: ❌ Connection timeout (port 6379 unreachable)
  • PostgreSQL pg_isready: ❌ No response (port 5432 unreachable)
  • PostgreSQL SELECT 1: ❌ Connection timeout

host.docker.internal resolves to 172.17.0.1 but neither service port is reachable from this runner environment. Service containers may not be running or are not bound to the expected ports.

🔌 Service connectivity validated by Smoke Services

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

Test Status
GitHub MCP connectivity
GitHub.com HTTP connectivity ❌ (pre-step data unavailable — template not substituted)
File write/read ❌ (pre-step data unavailable — template not substituted)

Overall: FAIL

PR: fix(cli-proxy): replace gh api rate_limit liveness probe with TCP connectivity check
Author: @Copilot · Assignees: @lpcox, @Copilot

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results — Auth mode: PAT (COPILOT_GITHUB_TOKEN)

Test Result
GitHub MCP connectivity
GitHub.com HTTP ✅ 200
File write/read ❓ pre-step vars not substituted

PR: fix(cli-proxy): replace gh api rate_limit liveness probe with TCP connectivity check
Author: @Copilot | Assignees: @lpcox, @Copilot

Overall: PARTIAL — MCP and HTTP passed; file test unverifiable (template vars not substituted by runner).

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

🔑 Copilot BYOK Smoke Test Results

Test 1: GitHub MCP Connectivity — Verified last 2 merged PRs
Test 2: GitHub.com Connectivity — HTTP 200 OK
Test 3: File Write/Read Test — File verified at /tmp/gh-aw/agent/smoke-test-copilot-byok.txt
Test 4: BYOK Inference — Agent responding in direct BYOK mode

Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY) via api-proxy → api.githubcopilot.com

Overall: PASS

cc @lpcox @Copilot

🔑 BYOK report filed by Smoke Copilot BYOK

🔑 BYOK report filed by Smoke Copilot BYOK

@github github deleted a comment from manamansor Jun 10, 2026
@github github deleted a comment from manamansor Jun 10, 2026
@github github locked and limited conversation to collaborators Jun 10, 2026
@lpcox lpcox closed this Jun 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[awf] cli-proxy: awf-cli-proxy container exits(1), blocking AIC Usage Optimizer workflow

3 participants