fix(api-proxy): stop double-counting cached tokens in AI credits#4760
Conversation
Both Anthropic and OpenAI report input_tokens as the TOTAL input including cache_read and cache_creation tokens. The AI credits calculation was charging the full input_tokens at the full input rate, then ALSO charging cache_read_tokens at the cache rate — effectively double-counting cached tokens. Example (claude-sonnet-4-6, 3M input, 2.9M cached): Before: 3M × $3/Mtok + 2.9M × $0.30/Mtok ≈ 1017 AIC After: 0.1M × $3/Mtok + 2.9M × $0.30/Mtok ≈ 192 AIC (correct) The fix subtracts cache_read_tokens and cache_write_tokens from input_tokens before applying the full input rate, since those portions are already accounted for at their respective discounted rates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
|
GitHub API: ✅ PASS Total: PASS
|
🔥 Smoke Test: Copilot PAT Auth — FAIL
Overall: FAIL — pre-step outputs were not evaluated; tests 2 & 3 could not be verified. PR: fix(api-proxy): stop double-counting cached tokens in AI credits · Author: @lpcox · Reviewer: Auth mode: PAT (COPILOT_GITHUB_TOKEN)
|
Smoke Test: Copilot BYOK (Direct) Mode ✅
Mode: Direct BYOK ( Author: @lpcox | Reviewers:
|
Overall: PASS
|
Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra Overall: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "api.openai.com"See Network Configuration for more information.
|
|
Reviewed PRs:
Checks:
Overall status: PASS Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
This comment has been minimized.
This comment has been minimized.
🔭 Smoke Test: API Proxy OpenTelemetry Tracing
All scenarios pass. OTEL tracing integration is functioning correctly on this PR.
|
🔍 Chroot Version Comparison
Result: Not all tests passed. Python and Node.js versions differ between host and chroot environments.
|
Smoke Test Results
Overall: FAIL —
|
🤖 Smoke Test: Copilot Engine — PASS
Overall: PASS — @lpcox (no assignees)
|
Problem
AI credits are significantly overcharged when prompt caching is active. Both Anthropic and OpenAI report
input_tokensas the total input including cached tokens, but the credit calculation was charging:input_tokensat the full input rate ($3.00/Mtok for claude-sonnet-4-6)cache_read_tokensat the cache rate ($0.30/Mtok)This double-counts the cached portion — charging it at full price (included in input_tokens) and again at cache price.
Example (claude-sonnet-4-6, 3M total input, 2.9M from cache)
The cached tokens were being charged at $3.30/Mtok (full + cache rate) instead of just $0.30/Mtok.
Fix
Subtract
cache_read_tokensandcache_write_tokensfrominput_tokensbefore applying the full input rate:Provider semantics confirmed
input_tokensincludescache_read_input_tokens+cache_creation_input_tokens(docs)prompt_tokensincludesprompt_tokens_details.cached_tokens(docs)Testing