Skip to content

[token-consumption] Daily AIC Consumption Report - 2026-06-22 #40784

Description

@github-actions

Executive Summary

In the last 24h, 48,888.5 AIC were consumed across 117 unique agentic workflows in github/gh-aw, spread over 746 AI-credit-bearing gen_ai spans (avg 65.5 AIC/event, p95 253.8, single-event max 662.1). The top 10 workflows account for ~50% of all AIC (~24,588). Two PR-review workflows alone — PR Code Quality Reviewer (5,661) and Matt Pocock Skills Reviewer (5,405) — consume ~23% combined.

  • Sentry AIC: queryable, but only via an explicit numeric cast. The bare attribute gh-aw.aic resolves to an empty string-typed EAP column (returns null; sum() is rejected as "string type"). Real values are only reachable through tags[gh-aw.aic,number]. Any dashboard/saved query using the bare name will show AIC as missing — a silent observability trap.
  • Grafana AIC: not verifiable this run. The Tempo query MCP tools (tempo_traceql-search, tempo_get-attribute-names, tempo_get-trace) are absent from this MCP build; only datasource discovery is available. Treated as an observability gap (unknown), not zero.
  • No production failures: errors and logs datasets both returned 0 events in the window.

Key Metrics

Metric Value
Events analyzed (gen_ai spans) 15,495
All spans (gen_ai + http.server + default) 25,206
Events with AIC data 746
Events with AIC data (Sentry) 746
Events with AIC data (Grafana) Unknown — Tempo query tools unavailable
Total AIC 48,888.5
Unique workflows (AIC > 0) 117
Avg AIC/event 65.5
P95 AIC/event 253.8
Max single-event AIC 662.1
Events missing workflow name (gen_ai) 11,443 (73.8%)

Top 10 Workflows by AIC Consumption

Workflow Events Total AIC Avg AIC/Event
PR Code Quality Reviewer 77 5,661.4 73.5
Matt Pocock Skills Reviewer 81 5,405.1 66.7
Test Quality Sentinel 72 3,397.7 47.2
Design Decision Gate 🏗️ 81 2,341.4 28.9
PR Sous Chef 65 2,060.0 31.7
Contribution Check 15 1,374.1 91.6
Semantic Function Refactoring 3 1,335.3 445.1
[aw] Failure Investigator (6h) 7 1,101.5 157.4
Daily Rendering Scripts Verifier 2 1,004.1 502.0
Daily AgentRx Trace Optimizer 2 907.3 453.6

Highest single-event consumers (run id): PR Code Quality Reviewer run 27933865956 emitted 123.2 AIC on a single claude-sonnet-4.6 span. The two highest per-event averages — Daily Rendering Scripts Verifier (502/event) and Semantic Function Refactoring (445/event) — are low-volume but expensive-per-run; worth watching.

Ranking note: positions 1–8 (all ≥1,101 AIC) are firm. Positions 9–10 sit in a band where ~3,200 AIC across 17 low-volume (≤2-event) workflows could not be individually ranked because the MCP list_events tool cannot sort by a casted aggregate, so groups were pulled sorted by -count() and capped at 100 of 117 workflows. Captured ranking covers 45,686 / 48,888 AIC (93.4%).

Grafana AIC Findings

Grafana AIC: not queryable this run (observability gap, cause = tooling, not data).

  • The Tempo datasource exists and is healthy: grafanacloud-ghaw-traces (uid grafanacloud-traces, Tempo, region prod-eu-west-2).
  • The Grafana MCP build exposed only list_datasources / get_datasource. The trace-query tools required to inspect AIC on spans — tempo_get-attribute-names, tempo_traceql-search, tempo_get-attribute-values, tempo_get-trace — returned No such tool available. AIC presence/typing in Tempo therefore could not be confirmed.
  • Evidence (emit-side, not query-side): spans are exported to Sentry and Tempo from the same OTLP payload, and actions/setup/js/send_otlp_span.cjs:2129 encodes gh-aw.aic as doubleValue; the inline comment (:2120) asserts Tempo indexes it so { span."gh-aw.aic" > 0 } is queryable. This is plausible but unverified in this run. Suggested manual TraceQL: { span."gh-aw.aic" > 0 } over now-24h on datasource grafanacloud-traces.
  • events_with_aic_data_grafana is reported as Unknown (not 0) per the unknown-vs-zero rule.
Data Quality and Gaps
  • Events missing workflow identifiers: 11,443 of 15,495 gen_ai spans (73.8%) have gh-aw.workflow.name = null. These are the per-LLM-call child spans; only the token-owning span carries workflow attribution. All 746 AIC-bearing spans do have a workflow name (no null bucket), so AIC attribution itself is complete — but generic gen_ai-level workflow rollups will undercount.
  • Events missing AIC attributes: 14,749 of 15,495 gen_ai spans carry no AIC (expected — AIC is only attached to the job that owns token usage). default-op (3,641) and http.server (6,069) spans carry no AIC.
  • Sentry-specific AIC caveat: gh-aw.aic is dual-registered (string + number) in EAP. Bare references hit the empty string column. Canonical query path is tags[gh-aw.aic,number] in both query and aggregate fields. gh-aw.aic:>0 (string filter) returns nothing; tags[gh-aw.aic,number]:>0 returns the real 746-span population.
  • Tool limitation: list_events rejects ORDER BY on casted aggregates and count_unique(...) ("orderby must also be in the selected columns"). Only -count()/plain-field sorts work, capping server-side top-N-by-AIC; ranking done client-side over a 100-group fetch.
  • Assumptions/fallbacks: Sentry treated as canonical to avoid double counting (spans dual-exported). total_events_analyzed = gen_ai span population (15,495). A single run can emit multiple AIC-bearing spans (e.g. agent + activation/detection jobs), so event count ≠ run count.
  • Grafana-specific AIC caveat: see Grafana AIC Findings — query tooling unavailable; AIC queryability unverified.

Recommendations

  1. Cap or tier the PR-review fleet. PR Code Quality Reviewer, Matt Pocock Skills Reviewer, Test Quality Sentinel, Design Decision Gate, and PR Sous Chef are the top 5 (~18,866 AIC, ~39% of total). Consider gating them on PR size/paths, deduping overlapping review passes, or moving low-signal checks to a cheaper model — these run on nearly every PR.
  2. Investigate high per-event outliers. Daily Rendering Scripts Verifier (502/event) and Semantic Function Refactoring (445/event) burn ~7× the avg per run despite low volume. Audit their prompts/context size for runaway token usage.
  3. Fix the Sentry AIC type ambiguity (instrumentation gap). gh-aw.aic resolving to an empty string column means most users will conclude "AIC is missing." Either emit under a name that registers cleanly as numeric (e.g. a dedicated gh-aw.aic_credits double with no prior string writes) or document tags[gh-aw.aic,number] as the required access path in all saved queries/alerts/dashboards.
  4. Restore Grafana AIC verifiability + propagate workflow attribution. Enable the Tempo query MCP tools (or document a manual TraceQL runbook) so Grafana AIC can be confirmed numeric each run, and consider propagating gh-aw.workflow.name onto child gen_ai spans so the 73.8% currently-unattributed spans can be rolled up by workflow.

References

Generated by 📊 Daily AIC Consumption Report (Sentry + Grafana OTel) · 283 AIC · ⌖ 33.7 AIC · ⊞ 7.2K ·

  • expires on Jun 23, 2026, 5:56 AM UTC-08:00

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions