Precompute usage-artifact activity aggregates and include GitHub API rate limit data for usage-only reporting#40504
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Test Quality Sentinel completed test quality analysis. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
There was a problem hiding this comment.
Pull request overview
This PR enhances gh aw logs activity reporting by emitting a compact, precomputed activity summary into the conclusion job’s usage artifact, then using that summary to backfill run metrics during usage-only log processing. It also expands the maintenance activity report window (more runs) now that the default logs path is lightweight.
Changes:
- Generate and upload
usage/activity/summary.json(schemausage-activity-summary/v1) as part of the conclusion job’s compactusageartifact. - Add a logs-side loader/backfill path to apply usage-activity aggregates (turns, firewall request counts, MCP server call/error counts) when detailed artifacts aren’t downloaded.
- Increase maintenance workflow activity report run processing limit from
--count 100to--count 500and align tests.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/side_repo_maintenance.go | Increases activity report run window to --count 500 in side-repo maintenance workflow generation. |
| pkg/workflow/side_repo_maintenance_integration_test.go | Updates integration test expectations for the new --count 500 value. |
| pkg/workflow/notify_comment.go | Extends usage artifact collection with an inline Python aggregation step and uploads usage/activity/summary.json. |
| pkg/workflow/notify_comment_test.go | Adds assertions to ensure the activity summary generator and artifact path are included. |
| pkg/workflow/maintenance_workflow_yaml.go | Increases activity report run window to --count 500 in generated maintenance workflow YAML. |
| pkg/workflow/maintenance_workflow_test.go | Updates test expectations for the new --count 500 value. |
| pkg/cli/logs_usage_activity.go | Adds loader + backfill logic to apply usage activity aggregates to downloaded run results. |
| pkg/cli/logs_usage_activity_test.go | Adds unit tests for loading and applying the usage activity summary. |
| pkg/cli/logs_run_processor.go | Integrates loading/applying usage activity aggregates and runs token usage analysis without requiring agent artifact download. |
| pkg/cli/audit_report.go | Documents RequestCount backward-compat semantics in MCPServerStats. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 10/10 changed files
- Comments generated: 2
| var summary usageActivitySummary | ||
| if err := json.Unmarshal(raw, &summary); err != nil { | ||
| return nil, fmt.Errorf("parse usage activity summary %s: %w", cleanPath, err) | ||
| } | ||
| return &summary, nil |
| result.MCPToolUsage = &MCPToolUsageData{ | ||
| Servers: servers, | ||
| } | ||
| } |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🏗️ Design Decision Gate — ADR RequiredThis PR makes significant changes to core business logic (381 new lines in 📄 Draft ADR committed:
📋 What to do next
Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision. ❓ Why ADRs MatterADRs create a searchable, permanent record of why the codebase looks the way it does — for example, why activity aggregation lives inline in the generated conclusion job rather than in a committed script. Future contributors (and your future self) will thank you. 📋 Michael Nygard ADR Format ReferenceAn ADR must contain these four sections to be considered complete:
All ADRs are stored in
|
🧪 Test Quality Sentinel Report
📊 Metrics & Test Classification (5 tests analyzed)
Go: 5 (
|
There was a problem hiding this comment.
❌ Test Quality Sentinel: Coding-guideline violation detected. pkg/cli/logs_usage_activity_test.go is missing the required //go:build !integration build tag on line 1; all new assert.Equal calls also lack the required descriptive message argument. Please review the flagged tests in the comment above.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd, /grill-with-docs, /zoom-out, and /improve-codebase-architecture — requesting changes on test coverage gaps and two design issues.
📋 Key Themes & Highlights
Key Themes
- Test coverage gaps (3 comments): The core
applyUsageActivitySummaryToResultinvariant ("don't overwrite existing data") is only tested in the empty-result happy path. TheloadUsageActivitySummaryfallback path and error paths are entirely untested. - Zero-value ambiguity:
result.Run.Turns == 0cannot distinguish "parsed from full artifact, genuinely zero" from "not yet populated", so a crashed run could have its turn count silently overwritten by the precomputed value. - Schema version not validated: The
Schemafield inusageActivitySummaryis populated but never checked — a future producer bump to v2 would be silently accepted. - Inline Python untestability: 174 lines of log-parsing logic live as Go string literals with no feasible Go test. Parsing bugs would silently corrupt every future activity summary.
- Silent error swallowing: Both
loadUsageActivitySummaryandanalyzeTokenUsage(now ungated) surface errors only behind--verbose, making systematic failures invisible in normal operation.
Positive Highlights
- ✅ Clean producer→consumer architecture: the conclusion job writes a well-defined JSON schema, and the Go consumer backfills only missing fields — the precedence contract is sound in principle.
- ✅ Good use of
filepath.Cleanandos.IsNotExistguards inloadUsageActivitySummary. - ✅ The
usageActivityGatewayServertype is a well-scoped internal type that keeps the publicMCPServerStatsdecoupled from the wire format. - ✅ Tests are included and use
t.Parallel()throughout — appreciated. - ✅ The
--count 500change is well-motivated and the tests are updated in lock-step.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 145.6 AIC · ⌖ 8.54 AIC · ⊞ 6.9K
| assert.Equal(t, "github", result.MCPToolUsage.Servers[0].ServerName) | ||
| assert.Equal(t, 5, result.MCPToolUsage.Servers[0].ToolCallCount) | ||
| assert.Equal(t, 2, result.MCPToolUsage.Servers[0].ErrorCount) | ||
| } |
There was a problem hiding this comment.
[/tdd] The three backfill-precedence branches in applyUsageActivitySummaryToResult are only tested in the happy path (empty result). There are no tests verifying the "don't overwrite" invariant for any of the three guarded fields.
💡 Suggested test cases
func TestApplyUsageActivitySummary_DoesNotOverwriteExistingData(t *testing.T) {
t.Parallel()
// Turns: a non-zero value must not be replaced.
result := DownloadResult{}
result.Run.Turns = 9
applyUsageActivitySummaryToResult(
&usageActivitySummary{Session: &usageActivitySession{Turns: 4}},
&result,
)
assert.Equal(t, 9, result.Run.Turns, "existing turns must not be overwritten")
// FirewallAnalysis: a non-nil pointer must not be replaced.
existing := &FirewallAnalysis{TotalRequests: 100}
result2 := DownloadResult{FirewallAnalysis: existing}
applyUsageActivitySummaryToResult(
&usageActivitySummary{Firewall: &usageActivityFirewall{TotalRequests: 5}},
&result2,
)
assert.Same(t, existing, result2.FirewallAnalysis)
// MCPToolUsage: a non-nil pointer must not be replaced.
existingMCP := &MCPToolUsageData{}
result3 := DownloadResult{MCPToolUsage: existingMCP}
applyUsageActivitySummaryToResult(
&usageActivitySummary{Gateway: &usageActivityGateway{TotalCalls: 3}},
&result3,
)
assert.Same(t, existingMCP, result3.MCPToolUsage)
}This precedence contract is the core invariant of the backfill design: if a future change reorders full-artifact parsing and the backfill call, these tests will catch the regression immediately.
| assert.Equal(t, 7, summary.Session.Turns) | ||
| require.NotNil(t, summary.Gateway) | ||
| assert.Equal(t, 5, summary.Gateway.TotalCalls) | ||
| } |
There was a problem hiding this comment.
[/tdd] TestLoadUsageActivitySummary only tests the primary usage/activity/summary.json path. Three more cases need coverage: (1) the activity/summary.json fallback, (2) neither path exists → nil, nil, (3) a file exists but contains invalid JSON → non-nil error.
💡 Suggested additions
func TestLoadUsageActivitySummary_FallbackPath(t *testing.T) {
t.Parallel()
runDir := t.TempDir()
// Write to the legacy fallback location.
fallbackPath := filepath.Join(runDir, "activity", "summary.json")
require.NoError(t, os.MkdirAll(filepath.Dir(fallbackPath), 0o755))
require.NoError(t, os.WriteFile(fallbackPath, []byte(`{"session":{"turns":3}}`), 0o644))
summary, err := loadUsageActivitySummary(runDir)
require.NoError(t, err)
require.NotNil(t, summary)
assert.Equal(t, 3, summary.Session.Turns)
}
func TestLoadUsageActivitySummary_NoFile(t *testing.T) {
t.Parallel()
summary, err := loadUsageActivitySummary(t.TempDir())
require.NoError(t, err)
assert.Nil(t, summary)
}
func TestLoadUsageActivitySummary_MalformedJSON(t *testing.T) {
t.Parallel()
runDir := t.TempDir()
p := filepath.Join(runDir, "usage", "activity", "summary.json")
require.NoError(t, os.MkdirAll(filepath.Dir(p), 0o755))
require.NoError(t, os.WriteFile(p, []byte(`{not valid json`), 0o644))
_, err := loadUsageActivitySummary(runDir)
require.Error(t, err)
}|
|
||
| // Preserve previously parsed turn counts (from full session artifacts/events.jsonl) | ||
| // and only backfill when they are missing. | ||
| if summary.Session != nil && result.Run.Turns == 0 && summary.Session.Turns > 0 { |
There was a problem hiding this comment.
[/grill-with-docs] The condition result.Run.Turns == 0 cannot distinguish between "full artifacts downloaded, zero turns genuinely observed" and "turns field not yet populated". A run that crashes before the first user message would have Turns == 0 from full-artifact parsing, and the backfill would silently overwrite that with the precomputed value from the usage summary.
💡 Options to consider
The simplest fix is to use a sentinel that means "no full-artifact data available" instead of relying on the zero value. For example:
- Set
result.Run.Turns = -1(or use a separatebool turnsPopulatedflag) during full-artifact processing to signal "we parsed the artifacts and found zero turns". Then backfill only when the field is still at its zero-value default. - Alternatively, document explicitly in a comment on this line that a zero-turn run IS safe to backfill (e.g., "if full-artifact parsing produced 0, the precomputed value is equally authoritative"), so the behaviour is a conscious choice rather than an accidental side effect.
| ) | ||
|
|
||
| type usageActivitySummary struct { | ||
| Schema string `json:"schema,omitempty"` |
There was a problem hiding this comment.
[/grill-with-docs] The Schema field is unmarshalled but never inspected. If the producer emits a usage-activity-summary/v2 with different field semantics, this consumer will silently accept and misinterpret it.
💡 Suggested guard
Add a schema version check after parsing:
const usageActivitySummarySchema = "usage-activity-summary/v1"
// in loadUsageActivitySummary, after json.Unmarshal:
if summary.Schema != "" && summary.Schema != usageActivitySummarySchema {
return nil, fmt.Errorf("unsupported usage activity summary schema %q (expected %q)", summary.Schema, usageActivitySummarySchema)
}Since the schema is producer-controlled by this repo's conclusion job, a hard error is safe here — if the constant falls out of sync it will surface immediately in tests rather than silently producing wrong activity counts.
| fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("Failed to analyze token usage for run %d: %v", run.DatabaseID, tokenErr))) | ||
| } | ||
| // token-usage.jsonl is also available in the compact usage artifact. | ||
| tokenUsage, tokenErr := analyzeTokenUsage(runOutputDir, verbose) |
There was a problem hiding this comment.
[/zoom-out] Removing the hasFirewallArtifact gate means analyzeTokenUsage is now attempted for every run. For usage-only downloads that lack token-usage.jsonl, the error is silently swallowed unless --verbose is active. This means a systematic issue (e.g., a format change or missing file in every run's usage artifact) produces no observable signal in normal operation.
💡 Options
- Emit the warning unconditionally (not gated on
verbose) but at a lower-noise level (e.g., only once if the error repeats across many runs). - Alternatively, accept the current behaviour and add a comment here explaining that token-usage parsing is best-effort for usage-only downloads and a missing file is expected rather than exceptional.
| " [ -f /tmp/gh-aw/usage/agent/token_usage.jsonl ] || : > /tmp/gh-aw/usage/agent/token_usage.jsonl\n", | ||
| " [ -f /tmp/gh-aw/usage/detection/token_usage.jsonl ] || : > /tmp/gh-aw/usage/detection/token_usage.jsonl\n", | ||
| " mkdir -p /tmp/gh-aw/usage/activity\n", | ||
| " python3 - <<'PY'\n", |
There was a problem hiding this comment.
[/grill-with-docs] This adds ~174 lines of aggregation logic (firewall log parsing, session event counting, gateway JSONL aggregation) as inline Go string literals with no way to unit test the Python. A bug in the firewall column-index assumptions or event-type matching would silently produce wrong activity summaries in every future artifact without any test catching it.
💡 Suggestions
The PR description notes the script is intentionally inline to keep compiled workflows self-contained. That is a valid constraint. Given that, consider:
- Extract to an embedded Go file (
//go:embed scripts/activity_summary.py) so the same script can be exercised by ago testsubprocess test that writes known log fixtures and asserts on the JSON output — without changing the compiled workflow content. - If inline is mandatory, add a
TestBuildUsageArtifactUploadSteps_PythonScripttest that executes the extracted Python string against realistic fixture files (squid log, events.jsonl, gateway.jsonl) and asserts on the resultingsummary.json. This is feasible since Python 3 is available in CI. - At minimum, the test in
notify_comment_test.goshould assert on more than presence ofpython3 - <<'PY'— e.g., verify the SQUID_STATUS_INDEX constant and theusage-activity-summary/v1schema marker are present.
| " allowed = code in (200, 206, 304)\n", | ||
| " except ValueError:\n", | ||
| " allowed = False\n", | ||
| " if not allowed and any(marker in decision for marker in ('TCP_TUNNEL', 'TCP_HIT', 'TCP_MISS')):\n", |
There was a problem hiding this comment.
[/zoom-out] The secondary TCP_TUNNEL / TCP_HIT / TCP_MISS check fires for any request where the HTTP status code did not parse as 200 / 206 / 304, including valid error responses (403, 407). If a blocked HTTPS CONNECT produces 403 in the status column alongside TCP_TUNNEL in the decision column, this would miscount the request as allowed.
💡 Context and suggestion
In standard Squid behaviour, TCP_TUNNEL only appears when the CONNECT tunnel is successfully established, so a 403 + TCP_TUNNEL combination would be atypical. However, given this is a custom log format, it's worth adding a brief comment explaining:
- What exact values
parts[SQUID_STATUS_INDEX]andparts[SQUID_DECISION_INDEX]can take in this project's log format. - Whether
TCP_TUNNELcan co-occur with non-200 statuses in practice.
This makes the intent explicit and reduces the risk of a future log-format tweak silently changing the allowed/blocked counts.
| // distinguishes overall request volume (RequestCount) from tool-invocation | ||
| // volume (ToolCallCount). In usage-aggregate mode we only have per-server | ||
| // tool-call counts, so both fields are populated from that single source. | ||
| RequestCount: server.ToolCallCount, |
There was a problem hiding this comment.
[/improve-codebase-architecture] RequestCount and ToolCallCount are semantically distinct fields in MCPServerStats (request volume vs. tool-invocation volume), but both are set to the same server.ToolCallCount value here. Downstream code or reports that compute a requests_per_tool_call ratio — or display RequestCount separately from ToolCallCount — will always show 1.0 / identical values for any backfilled row, masking real usage patterns.
💡 Suggestion
Either:
- Leave
RequestCountat zero (its zero value) and set onlyToolCallCount, making it clear that request-level data is unavailable from usage aggregates. - Or rename the comment to explicitly state "usage aggregates only track tool invocations, not HTTP request volume; RequestCount is an approximation" so readers of downstream reports understand the limitation.
The existing comment explains the intent, but it's easy to miss. A zero value with a named sentinel (or a bool requestCountAvailable flag on MCPServerStats) would make the absence of data self-documenting.
| --repo "${{ github.repository }}" \ | ||
| --start-date -1w \ | ||
| --count 100 \ | ||
| --count 500 \ |
There was a problem hiding this comment.
[/grill-with-docs] The 5× increase from --count 100 to --count 500 is well-motivated (lighter artifact downloads), but there is no documentation of what ceiling was considered or what constraint the number 500 is tracking against. If the window needs to grow again, a future developer has no guidance on the limiting factor.
💡 Suggestion
Add a brief inline comment explaining the constraint, for example:
// 500 runs: fits within the GitHub Actions usage download budget for usage-only artifacts;
// raise further if the artifact footprint is reduced again.
" --count 500 \\\n",Alternatively, note in the PR description or a code comment what the expected download time / API call budget is at 500 vs. 100.
| // These aggregates are generated by the conclusion job and allow lightweight | ||
| // usage-only downloads to include firewall/session summaries. | ||
| usageActivitySummary, usageActivityErr := loadUsageActivitySummary(runOutputDir) | ||
| if usageActivityErr != nil && verbose { |
There was a problem hiding this comment.
[/zoom-out] Non-ENOENT errors from loadUsageActivitySummary (e.g., permission denied, storage corruption) are only surfaced when --verbose is active. A systematic read failure affecting every run would be completely invisible in normal operation — not counted in any metric, not printed, not aggregated.
💡 Suggestion
Consider emitting this warning unconditionally (or at least incrementing a skipped-run counter), since a non-ENOENT IO error is genuinely unexpected and should not require --verbose to diagnose. The warning is already low-noise (one line per run) and distinguishes the missing-file case (which loadUsageActivitySummary returns nil, nil for silently) from the real-error case.
There was a problem hiding this comment.
Review: non-blocking observations
The overall design is sound — precomputing activity aggregates into the usage artifact is the right approach for scaling the maintenance activity report without large artifact downloads. Three data-quality issues are worth addressing before this pattern proliferates.
📋 Issues found (3 medium/low, no blocking)
Medium — fallback chain broken on corrupt JSON (logs_usage_activity.go:63)
A malformed usage/activity/summary.json (partially written artifact) causes an immediate error return; the activity/summary.json fallback is never tried. os.IsNotExist falls through correctly — JSON parse failure should behave the same.
Medium — gateway double-counting risk (notify_comment.go:824)
mcp-logs/gateway.jsonl and gateway.jsonl within the same sandbox are iterated unconditionally. Both can exist during a log-path migration; when they do, every tool call is counted twice in the summary, inflating the MCPToolUsage.Servers data backfilled by applyUsageActivitySummaryToResult.
Low — TCP decision substring match over-counts allowed requests (notify_comment.go:754)
'TCP_MISS' in decision matches TCP_MISS_ABORTED, TCP_MISS/503, etc. The intent is to check the base decision token; splitting on / first and using in (...) would prevent compound codes from being misclassified as allowed.
🔎 Code quality review by PR Code Quality Reviewer · 166.1 AIC · ⌖ 7.67 AIC · ⊞ 5.1K
| } | ||
| var summary usageActivitySummary | ||
| if err := json.Unmarshal(raw, &summary); err != nil { | ||
| return nil, fmt.Errorf("parse usage activity summary %s: %w", cleanPath, err) |
There was a problem hiding this comment.
JSON parse failure on the primary candidate short-circuits the fallback path: if usage/activity/summary.json exists but contains malformed JSON (e.g. partially written during artifact upload), the function immediately returns an error and never tries the activity/summary.json fallback — so applyUsageActivitySummaryToResult no-ops silently.
💡 Suggested fix
Fall through on unmarshal failure the same way os.IsNotExist falls through:
var summary usageActivitySummary
if err := json.Unmarshal(raw, &summary); err != nil {
continue // try next candidate
}
return &summary, nilOr collect the error and return it only after exhausting all candidates:
var lastErr error
for _, candidate := range candidates {
raw, err := os.ReadFile(cleanPath)
if err != nil {
if os.IsNotExist(err) { continue }
return nil, err
}
var s usageActivitySummary
if err := json.Unmarshal(raw, &s); err != nil {
lastErr = err
continue
}
return &s, nil
}
return nil, lastErrA partially-written JSON file from an interrupted artifact upload should behave the same as a missing file for the purposes of the fallback chain.
| " gateway = {'total_calls': 0, 'failed_calls': 0, 'servers': {}}\n", | ||
| " gateway_paths = [\n", | ||
| " '/tmp/gh-aw/sandbox/agent/logs/mcp-logs/gateway.jsonl',\n", | ||
| " '/tmp/gh-aw/sandbox/agent/logs/gateway.jsonl',\n", |
There was a problem hiding this comment.
Gateway call double-counting when both log paths coexist: mcp-logs/gateway.jsonl and gateway.jsonl within the same sandbox are iterated unconditionally. If both files exist for the same run (migration window, duplicate writers), every tool call is counted twice — inflating total_calls, per-server tool_call_count, and the derived MCPToolUsage.Servers that applyUsageActivitySummaryToResult injects into reports.
💡 Suggested fix
Process only the first path that exists within each sandbox pair, stopping once a file is found:
# Try modern path first; fall back to legacy path within each sandbox
sandbox_gateway_pairs = [
(
'/tmp/gh-aw/sandbox/agent/logs/mcp-logs/gateway.jsonl',
'/tmp/gh-aw/sandbox/agent/logs/gateway.jsonl',
),
(
'/tmp/gh-aw/threat-detection/sandbox/agent/logs/mcp-logs/gateway.jsonl',
'/tmp/gh-aw/threat-detection/sandbox/agent/logs/gateway.jsonl',
),
]
gateway_paths = []
for modern, legacy in sandbox_gateway_pairs:
if os.path.exists(modern):
gateway_paths.append(modern)
elif os.path.exists(legacy):
gateway_paths.append(legacy)Alternatively, if both files can legitimately hold different events, track seen lines by content hash to deduplicate. The current approach trusts that only one file exists per sandbox, which may not hold during log-path migrations.
| " allowed = code in (200, 206, 304)\n", | ||
| " except ValueError:\n", | ||
| " allowed = False\n", | ||
| " if not allowed and any(marker in decision for marker in ('TCP_TUNNEL', 'TCP_HIT', 'TCP_MISS')):\n", |
There was a problem hiding this comment.
TCP decision marker uses substring matching: 'TCP_MISS' in decision will match compound Squid codes like TCP_MISS_ABORTED or TCP_MISS/503, classifying a failed-or-aborted request as allowed. The same applies to TCP_HIT matching TCP_HIT_REFRESH_FAIL.
💡 Suggested fix
Match on the exact token (prefix before / or end of string) rather than a substring:
def is_allowed_decision(decision: str) -> bool:
# Decision field is typically "TCP_MISS/200" or "TCP_TUNNEL/200"
base = decision.split('/')[0] # strip HTTP status suffix
return base in ('TCP_TUNNEL', 'TCP_HIT', 'TCP_MISS')
if not allowed and is_allowed_decision(decision):
allowed = TrueThis is a monitoring metric so the impact is limited to inaccurate counters, but TCP_MISS_ABORTED appearing as "allowed" could mask elevated block rates in environments with connection failures.
|
@copilot run pr-finisher skill |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
|
|
📰 BREAKING: Smoke Copilot - AOAI (apikey) is now investigating this pull request. Sources say the story is developing... |
|
📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing... |
|
📰 BREAKING: Smoke Copilot - AOAI (Entra) is now investigating this pull request. Sources say the story is developing... |
|
Smoke test FAIL
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "proxy.golang.org"See Network Configuration for more information. |
Smoke Test Results
Overall Status: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
Agent Container Tool Check
Result: 12/12 tools available ✅ Overall Status: PASS
|
|
Commit pushed:
|
|
Smoke Test: Codex - 27892632179 Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Comment MemoryNote This comment is managed by comment memory.It stores persistent context for this thread in the code block at the top of this comment. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
💥 Smoke Test: Claude — Run 27892632153Core #1-12: ✅ all passed Overall: PASS (one optional close-PR test skipped) Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
|
PR: Precompute usage-artifact activity aggregates and include GitHub API rate limit data for usage-only reporting Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Comment MemoryNote This comment is managed by comment memory.It stores persistent context for this thread in the code block at the top of this comment. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
There was a problem hiding this comment.
💥 Automated smoke test review - all systems nominal!
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.com
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
💥 [THE END] — Illustrated by Smoke Claude · 101.2 AIC · ⊞ 8.5K
| @@ -0,0 +1,40 @@ | |||
| # ADR-40504: Precompute Usage-Artifact Activity Aggregates for Lightweight Logs Reporting | |||
There was a problem hiding this comment.
Clear ADR title that captures the precompute-aggregates intent. 👍
| ## Decision | ||
|
|
||
| We will have the conclusion job precompute run-activity aggregates into the `usage` artifact at `usage/activity/summary.json`, tagged with the schema marker `usage-activity-summary/v1`. The aggregation runs as an inline `python3` step embedded in the generated workflow (so compiled workflows stay self-contained with no extra repository-file dependency) and rolls up firewall request counts, Copilot session event counters, and gateway tool-call counters with a per-server breakdown. The logs pipeline (`pkg/cli/logs_usage_activity.go`) loads this summary and backfills only fields that are still empty after normal artifact parsing, keeping precedence explicit: detailed artifacts win, the usage summary fills gaps only. | ||
|
|
There was a problem hiding this comment.
Consider noting how the v1 schema marker will be migrated if aggregate fields change in future.
There was a problem hiding this comment.
Smoke bot looked. Notes left on two lines.
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.com
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
📰 BREAKING: Report filed by Smoke Copilot · 459.8 AIC · ⊞ 19.9K
Node.js 24's builtin globSync from node:fs doesn't support the nodir option. Removed it since the patterns (*.log, *.jsonl) already match only files, not directories. This fixes the TypeScript typecheck errors. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in 45ffade. The issue was that Node.js 24's builtin |
This updates the conclusion-job token-usage pipeline to emit compact run-activity aggregates into the
usageartifact, sogh aw logscan produce richer activity reporting without downloading large agent artifacts. It also updates maintenance activity reporting to leverage this lighter path and process a larger run window.Conclusion job: add precomputed activity aggregates to
usageartifactpkg/workflow/notify_comment.goto generate/tmp/gh-aw/usage/activity/summary.json.total_requests,allowed_requests,blocked_requeststotal_calls,failed_calls, per-server call/error rollupsusage-activity-summary/v1./tmp/gh-aw/github_rate_limits.jsonlto the compactusageartifact so usage-only downloads can retain GitHub API rate limit consumption data.Logs pipeline: consume aggregate summary from usage-only downloads
pkg/cli/logs_usage_activity.goto load/applyusage/activity/summary.json.pkg/cli/logs_run_processor.go) so missing fields (turns/firewall/MCP server stats) are populated from usage aggregates.Maintenance activity report: increase processing window with lightweight artifacts
gh aw logs --countfrom100to500:pkg/workflow/maintenance_workflow_yaml.gopkg/workflow/side_repo_maintenance.go{ "schema": "usage-activity-summary/v1", "firewall": { "total_requests": 42, "allowed_requests": 39, "blocked_requests": 3 }, "session": { "turns": 11, "total_events": 87 }, "gateway": { "total_calls": 25, "failed_calls": 2, "servers": [{ "server_name": "github", "tool_call_count": 18, "failed_calls": 1 }] } }pr-sous-chef run 27889722422: please rebase/update the branch and continue addressing review feedback.
✨ PR Review Safe Output Test - Run 27892109529
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.comSee Network Configuration for more information.
✨ PR Review Safe Output Test - Run 27892632153
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.comSee Network Configuration for more information.