Skip to content

fix(api-proxy): stop double-counting cached tokens in AI credits#4760

Merged
lpcox merged 1 commit into
mainfrom
fix-ai-credits-cache-double-count
Jun 11, 2026
Merged

fix(api-proxy): stop double-counting cached tokens in AI credits#4760
lpcox merged 1 commit into
mainfrom
fix-ai-credits-cache-double-count

Conversation

@lpcox

@lpcox lpcox commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Problem

AI credits are significantly overcharged when prompt caching is active. Both Anthropic and OpenAI report input_tokens as the total input including cached tokens, but the credit calculation was charging:

  1. The full input_tokens at the full input rate ($3.00/Mtok for claude-sonnet-4-6)
  2. Plus cache_read_tokens at the cache rate ($0.30/Mtok)

This double-counts the cached portion — charging it at full price (included in input_tokens) and again at cache price.

Example (claude-sonnet-4-6, 3M total input, 2.9M from cache)

Calculation Formula Result
❌ Before (bug) 3.0M × $3/Mtok + 2.9M × $0.30/Mtok + output ~1017 AIC
✅ After (correct) 0.1M × $3/Mtok + 2.9M × $0.30/Mtok + output ~192 AIC

The cached tokens were being charged at $3.30/Mtok (full + cache rate) instead of just $0.30/Mtok.

Fix

Subtract cache_read_tokens and cache_write_tokens from input_tokens before applying the full input rate:

const nonCachedInput = Math.max(0, totalInput - cacheReadTokens - cacheWriteTokens);
const inputCredits = (nonCachedInput * pricing.input) / CREDIT_DENOMINATOR;
const cachedInputCredits = (cacheReadTokens * pricing.cachedInput) / CREDIT_DENOMINATOR;
const cacheWriteCredits = (cacheWriteTokens * pricing.cacheWrite) / CREDIT_DENOMINATOR;

Provider semantics confirmed

  • Anthropic: input_tokens includes cache_read_input_tokens + cache_creation_input_tokens (docs)
  • OpenAI: prompt_tokens includes prompt_tokens_details.cached_tokens (docs)

Testing

  • Updated existing credit calculation tests with correct expected values
  • Added new test: 3M input / 2.9M cached scenario asserting < 250 AIC (was ~1017)
  • All 999 api-proxy tests pass

Both Anthropic and OpenAI report input_tokens as the TOTAL input
including cache_read and cache_creation tokens. The AI credits
calculation was charging the full input_tokens at the full input rate,
then ALSO charging cache_read_tokens at the cache rate — effectively
double-counting cached tokens.

Example (claude-sonnet-4-6, 3M input, 2.9M cached):
  Before: 3M × $3/Mtok + 2.9M × $0.30/Mtok ≈ 1017 AIC
  After:  0.1M × $3/Mtok + 2.9M × $0.30/Mtok ≈ 192 AIC (correct)

The fix subtracts cache_read_tokens and cache_write_tokens from
input_tokens before applying the full input rate, since those portions
are already accounted for at their respective discounted rates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 15:31
@github-actions

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 96.42% 96.46% 📈 +0.04%
Statements 96.34% 96.38% 📈 +0.04%
Functions 98.77% 98.77% ➡️ +0.00%
Branches 90.74% 90.78% 📈 +0.04%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/config-writer.ts 89.3% → 90.9% (+1.65%) 89.3% → 90.9% (+1.65%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

@github-actions

Copy link
Copy Markdown
Contributor

GitHub API: ✅ PASS
GitHub check: ✅ PASS
File verify: ✅ PASS

Total: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test: Copilot PAT Auth — FAIL

Test Result
GitHub MCP connectivity ✅ PASS
GitHub.com HTTP connectivity ❌ Pre-step data unavailable (${{ steps.smoke-data.outputs.SMOKE_HTTP_CODE }} not substituted)
File write/read ❌ Pre-step data unavailable (path template not substituted)

Overall: FAIL — pre-step outputs were not evaluated; tests 2 & 3 could not be verified.

PR: fix(api-proxy): stop double-counting cached tokens in AI credits · Author: @lpcox · Reviewer: @Copilot

Auth mode: PAT (COPILOT_GITHUB_TOKEN)

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct) Mode ✅

Test Result
GitHub MCP connectivity
GitHub.com HTTP ✅ (HTTP 200)
File write/read
BYOK inference path ✅ (agent → api-proxy → api.githubcopilot.com)

Mode: Direct BYOK (COPILOT_PROVIDER_API_KEY via api-proxy sidecar)
Status: PASS — all tests passed

Author: @lpcox | Reviewers: @Copilot

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox

  • GitHub MCP connectivity: ✅
  • GitHub.com connectivity (HTTP 200/301): ✅
  • File write/read in sandbox: ✅
  • Running in direct BYOK mode via api-proxy → Azure OpenAI (Foundry, o4-mini-aw): ✅

Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox

  • GitHub MCP connectivity: ❌
  • GitHub.com connectivity: ✅
  • File write/read test: ❌
  • BYOK inference path: ✅

Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra

Overall: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • api.openai.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "api.openai.com"

See Network Configuration for more information.

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

Reviewed PRs:

Checks:

  • GitHub title check: ✅
  • Temp file write/read: ✅
  • Discussion lookup/comment: ✅
  • Build (npm ci && npm run build): ✅

Overall status: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

This comment has been minimized.

@github-actions

Copy link
Copy Markdown
Contributor

🔭 Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Result Details
1. Module Loading otel.js loads cleanly; exports: startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled
2. Test Suite 39/39 tests pass across span creation, token attributes, parent context propagation, OTLP export, file export, and shutdown
3. Env Var Forwarding api-proxy-service-config.ts forwards OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME
4. Token Tracker Integration onUsage callback present in token-tracker-http.js (lines 62/66/269); invoked after normalized usage extraction
5. OTEL Diagnostics No OTEL_EXPORTER_OTLP_ENDPOINT set → graceful degradation to local file fallback (/var/log/api-proxy/otel.jsonl); isEnabled() returns true

All scenarios pass. OTEL tracing integration is functioning correctly on this PR.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Chroot Version Comparison

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.16.0 v22.22.3
Go go1.22.12 go1.22.12

Result: Not all tests passed. Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results

Check Result
Redis PING ❌ No response
PostgreSQL pg_isready ❌ No response
PostgreSQL SELECT 1 ❌ No response

Overall: FAILhost.docker.internal is unreachable from this runner. Service containers are not accessible.

🔌 Service connectivity validated by Smoke Services

@lpcox lpcox merged commit ab519be into main Jun 11, 2026
84 of 85 checks passed
@lpcox lpcox deleted the fix-ai-credits-cache-double-count branch June 11, 2026 15:49
@lpcox lpcox removed the request for review from Copilot June 11, 2026 15:52
@github-actions

Copy link
Copy Markdown
Contributor

🤖 Smoke Test: Copilot Engine — PASS

Test Result
GitHub MCP ✅ PR listed: fix(api-proxy): stop double-counting cached tokens in AI credits
GitHub.com connectivity ✅ HTTP 200
File write/read smoke-test-copilot-27359745034.txt verified

Overall: PASS@lpcox (no assignees)

📰 BREAKING: Report filed by Smoke Copilot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant