Skip to content

Precompute usage-artifact activity aggregates and include GitHub API rate limit data for usage-only reporting#40504

Merged
pelikhan merged 17 commits into
mainfrom
copilot/update-conclusion-job-aggregate-data
Jun 21, 2026
Merged

Precompute usage-artifact activity aggregates and include GitHub API rate limit data for usage-only reporting#40504
pelikhan merged 17 commits into
mainfrom
copilot/update-conclusion-job-aggregate-data

Conversation

Copilot AI commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

This updates the conclusion-job token-usage pipeline to emit compact run-activity aggregates into the usage artifact, so gh aw logs can produce richer activity reporting without downloading large agent artifacts. It also updates maintenance activity reporting to leverage this lighter path and process a larger run window.

  • Conclusion job: add precomputed activity aggregates to usage artifact

    • Extended usage artifact collection in pkg/workflow/notify_comment.go to generate /tmp/gh-aw/usage/activity/summary.json.
    • Aggregates now include:
      • firewall: total_requests, allowed_requests, blocked_requests
      • session: turn/event counters (starts, shutdowns, assistant/reasoning/tool execution counters)
      • gateway: total_calls, failed_calls, per-server call/error rollups
    • Added schema marker: usage-activity-summary/v1.
    • Included the new summary file in artifact upload paths.
    • Added /tmp/gh-aw/github_rate_limits.jsonl to the compact usage artifact so usage-only downloads can retain GitHub API rate limit consumption data.
  • Logs pipeline: consume aggregate summary from usage-only downloads

    • Added pkg/cli/logs_usage_activity.go to load/apply usage/activity/summary.json.
    • Integrated backfill into run processing (pkg/cli/logs_run_processor.go) so missing fields (turns/firewall/MCP server stats) are populated from usage aggregates.
    • Kept precedence explicit: detailed artifacts win; usage summary fills gaps only.
    • Validates the declared usage summary schema before applying backfill and preserves fallback behavior across candidate summary paths.
  • Maintenance activity report: increase processing window with lightweight artifacts

    • Updated generated maintenance workflows to process more runs by raising activity report gh aw logs --count from 100 to 500:
      • pkg/workflow/maintenance_workflow_yaml.go
      • pkg/workflow/side_repo_maintenance.go
    • Aligned workflow-generation tests with new count and usage-summary behavior.
{
  "schema": "usage-activity-summary/v1",
  "firewall": { "total_requests": 42, "allowed_requests": 39, "blocked_requests": 3 },
  "session": { "turns": 11, "total_events": 87 },
  "gateway": {
    "total_calls": 25,
    "failed_calls": 2,
    "servers": [{ "server_name": "github", "tool_call_count": 18, "failed_calls": 1 }]
  }
}

pr-sous-chef run 27889722422: please rebase/update the branch and continue addressing review feedback.

Generated by 👨‍🍳 PR Sous Chef · 56.7 AIC · ⌖ 1.01 AIC · ⊞ 17.3K ·



✨ PR Review Safe Output Test - Run 27892109529

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 88.7 AIC · ⊞ 8.5K ·



✨ PR Review Safe Output Test - Run 27892632153

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 101.2 AIC · ⊞ 8.5K ·

Copilot AI and others added 9 commits June 20, 2026 17:04
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review June 20, 2026 19:52
Copilot AI review requested due to automatic review settings June 20, 2026 19:52
@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances gh aw logs activity reporting by emitting a compact, precomputed activity summary into the conclusion job’s usage artifact, then using that summary to backfill run metrics during usage-only log processing. It also expands the maintenance activity report window (more runs) now that the default logs path is lightweight.

Changes:

  • Generate and upload usage/activity/summary.json (schema usage-activity-summary/v1) as part of the conclusion job’s compact usage artifact.
  • Add a logs-side loader/backfill path to apply usage-activity aggregates (turns, firewall request counts, MCP server call/error counts) when detailed artifacts aren’t downloaded.
  • Increase maintenance workflow activity report run processing limit from --count 100 to --count 500 and align tests.
Show a summary per file
File Description
pkg/workflow/side_repo_maintenance.go Increases activity report run window to --count 500 in side-repo maintenance workflow generation.
pkg/workflow/side_repo_maintenance_integration_test.go Updates integration test expectations for the new --count 500 value.
pkg/workflow/notify_comment.go Extends usage artifact collection with an inline Python aggregation step and uploads usage/activity/summary.json.
pkg/workflow/notify_comment_test.go Adds assertions to ensure the activity summary generator and artifact path are included.
pkg/workflow/maintenance_workflow_yaml.go Increases activity report run window to --count 500 in generated maintenance workflow YAML.
pkg/workflow/maintenance_workflow_test.go Updates test expectations for the new --count 500 value.
pkg/cli/logs_usage_activity.go Adds loader + backfill logic to apply usage activity aggregates to downloaded run results.
pkg/cli/logs_usage_activity_test.go Adds unit tests for loading and applying the usage activity summary.
pkg/cli/logs_run_processor.go Integrates loading/applying usage activity aggregates and runs token usage analysis without requiring agent artifact download.
pkg/cli/audit_report.go Documents RequestCount backward-compat semantics in MCPServerStats.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 10/10 changed files
  • Comments generated: 2

Comment on lines +61 to +65
var summary usageActivitySummary
if err := json.Unmarshal(raw, &summary); err != nil {
return nil, fmt.Errorf("parse usage activity summary %s: %w", cleanPath, err)
}
return &summary, nil
Comment on lines +104 to +107
result.MCPToolUsage = &MCPToolUsageData{
Servers: servers,
}
}
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (381 new lines in pkg/, above the 100-line threshold) but does not have a linked Architecture Decision Record (ADR).

📄 Draft ADR committed: docs/adr/40504-precompute-usage-artifact-activity-aggregates.md — review and complete it before merging.

🔒 This PR cannot merge until an ADR is linked in the PR body.

📋 What to do next
  1. Review the draft ADR committed to your branch — it was generated from the PR diff (the new usage-activity-summary/v1 aggregate path and the --count 100 → 500 window increase).
  2. Complete the missing sections — confirm the inline-Python-vs-script trade-off, the RequestCount/ToolCallCount fidelity note, and any alternatives you actually weighed.
  3. Commit the finalized ADR to docs/adr/ on your branch.
  4. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-40504: Precompute Usage-Artifact Activity Aggregates

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

❓ Why ADRs Matter

ADRs create a searchable, permanent record of why the codebase looks the way it does — for example, why activity aggregation lives inline in the generated conclusion job rather than in a committed script. Future contributors (and your future self) will thank you.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 40504-precompute-usage-artifact-activity-aggregates.md for PR #40504).

🔒 This PR cannot merge until an ADR is linked in the PR body.

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · 87 AIC · ⌖ 10.3 AIC · ⊞ 13.6K ·

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 70/100 — Acceptable

Analyzed 5 test(s): 5 design, 0 implementation, 2 guideline violation(s) — missing build tag and missing assertion messages in the new test file.

📊 Metrics & Test Classification (5 tests analyzed)
Metric Value
New/modified tests analyzed 5
✅ Design tests (behavioral contracts) 5 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 0 (0%)
Duplicate test clusters 0
Test inflation detected No
🚨 Coding-guideline violations 2 (missing build tag; missing assertion messages)
Test File Classification Issues Detected
TestLoadUsageActivitySummary pkg/cli/logs_usage_activity_test.go:13 ✅ Design 🚨 Missing build tag; ⚠️ No error cases; ⚠️ Bare assert.Equal (no message arg)
TestApplyUsageActivitySummaryToResult pkg/cli/logs_usage_activity_test.go:38 ✅ Design 🚨 Missing build tag (file); ⚠️ No error cases; ⚠️ Bare assert.Equal (no message arg)
TestGenerateMaintenanceWorkflow_OperationJobConditions pkg/workflow/maintenance_workflow_test.go:567 ✅ Design
TestConclusionJobIncludesUsageArtifactSteps pkg/workflow/notify_comment_test.go:1213 ✅ Design
TestSideRepoMaintenanceGeneratesWorkflow pkg/workflow/side_repo_maintenance_integration_test.go:116 ✅ Design

Go: 5 (*_test.go); JavaScript: 0. No other languages detected.

⚠️ Flagged Tests — Requires Review (2 issue(s))

TestLoadUsageActivitySummary (pkg/cli/logs_usage_activity_test.go:13) — 🚨 Missing build tag (hard violation): pkg/cli/logs_usage_activity_test.go has no //go:build !integration or //go:build integration tag on line 1. Every *_test.go file must declare its build category. Suggested fix: add //go:build !integration as the very first line (before package cli).

This function also has bare assert.Equal calls without a descriptive message argument (e.g. assert.Equal(t, 10, summary.Firewall.TotalRequests)assert.Equal(t, 10, summary.Firewall.TotalRequests, "firewall total_requests should be parsed from JSON")). Additionally there are no error-path assertions: the test only exercises a well-formed JSON file; consider adding sub-cases for a missing file or malformed JSON.

TestApplyUsageActivitySummaryToResult (pkg/cli/logs_usage_activity_test.go:38) — ⚠️ Same file-level build tag violation applies. All assert.Equal / require.* calls lack a descriptive message argument. The test is happy-path only; a nil Session, nil Firewall, or nil Gateway is not exercised. Suggested fix: add the build tag and message args; consider a table-driven sub-test or a second case for a partially-nil summary.

Verdict

Check failed. 0% implementation tests (threshold: 30%) — design test ratio is excellent. However, pkg/cli/logs_usage_activity_test.go is missing the required //go:build !integration build tag, which is a mandatory coding-guideline violation. Assertion message arguments are also absent on all new assert.Equal calls. Please fix these before merging.

🧪 Test quality analysis by Test Quality Sentinel · 126.2 AIC · ⊞ 8.3K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Test Quality Sentinel: Coding-guideline violation detected. pkg/cli/logs_usage_activity_test.go is missing the required //go:build !integration build tag on line 1; all new assert.Equal calls also lack the required descriptive message argument. Please review the flagged tests in the comment above.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd, /grill-with-docs, /zoom-out, and /improve-codebase-architecture — requesting changes on test coverage gaps and two design issues.

📋 Key Themes & Highlights

Key Themes

  • Test coverage gaps (3 comments): The core applyUsageActivitySummaryToResult invariant ("don't overwrite existing data") is only tested in the empty-result happy path. The loadUsageActivitySummary fallback path and error paths are entirely untested.
  • Zero-value ambiguity: result.Run.Turns == 0 cannot distinguish "parsed from full artifact, genuinely zero" from "not yet populated", so a crashed run could have its turn count silently overwritten by the precomputed value.
  • Schema version not validated: The Schema field in usageActivitySummary is populated but never checked — a future producer bump to v2 would be silently accepted.
  • Inline Python untestability: 174 lines of log-parsing logic live as Go string literals with no feasible Go test. Parsing bugs would silently corrupt every future activity summary.
  • Silent error swallowing: Both loadUsageActivitySummary and analyzeTokenUsage (now ungated) surface errors only behind --verbose, making systematic failures invisible in normal operation.

Positive Highlights

  • ✅ Clean producer→consumer architecture: the conclusion job writes a well-defined JSON schema, and the Go consumer backfills only missing fields — the precedence contract is sound in principle.
  • ✅ Good use of filepath.Clean and os.IsNotExist guards in loadUsageActivitySummary.
  • ✅ The usageActivityGatewayServer type is a well-scoped internal type that keeps the public MCPServerStats decoupled from the wire format.
  • ✅ Tests are included and use t.Parallel() throughout — appreciated.
  • ✅ The --count 500 change is well-motivated and the tests are updated in lock-step.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 145.6 AIC · ⌖ 8.54 AIC · ⊞ 6.9K

assert.Equal(t, "github", result.MCPToolUsage.Servers[0].ServerName)
assert.Equal(t, 5, result.MCPToolUsage.Servers[0].ToolCallCount)
assert.Equal(t, 2, result.MCPToolUsage.Servers[0].ErrorCount)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The three backfill-precedence branches in applyUsageActivitySummaryToResult are only tested in the happy path (empty result). There are no tests verifying the "don't overwrite" invariant for any of the three guarded fields.

💡 Suggested test cases
func TestApplyUsageActivitySummary_DoesNotOverwriteExistingData(t *testing.T) {
    t.Parallel()

    // Turns: a non-zero value must not be replaced.
    result := DownloadResult{}
    result.Run.Turns = 9
    applyUsageActivitySummaryToResult(
        &usageActivitySummary{Session: &usageActivitySession{Turns: 4}},
        &result,
    )
    assert.Equal(t, 9, result.Run.Turns, "existing turns must not be overwritten")

    // FirewallAnalysis: a non-nil pointer must not be replaced.
    existing := &FirewallAnalysis{TotalRequests: 100}
    result2 := DownloadResult{FirewallAnalysis: existing}
    applyUsageActivitySummaryToResult(
        &usageActivitySummary{Firewall: &usageActivityFirewall{TotalRequests: 5}},
        &result2,
    )
    assert.Same(t, existing, result2.FirewallAnalysis)

    // MCPToolUsage: a non-nil pointer must not be replaced.
    existingMCP := &MCPToolUsageData{}
    result3 := DownloadResult{MCPToolUsage: existingMCP}
    applyUsageActivitySummaryToResult(
        &usageActivitySummary{Gateway: &usageActivityGateway{TotalCalls: 3}},
        &result3,
    )
    assert.Same(t, existingMCP, result3.MCPToolUsage)
}

This precedence contract is the core invariant of the backfill design: if a future change reorders full-artifact parsing and the backfill call, these tests will catch the regression immediately.

assert.Equal(t, 7, summary.Session.Turns)
require.NotNil(t, summary.Gateway)
assert.Equal(t, 5, summary.Gateway.TotalCalls)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] TestLoadUsageActivitySummary only tests the primary usage/activity/summary.json path. Three more cases need coverage: (1) the activity/summary.json fallback, (2) neither path exists → nil, nil, (3) a file exists but contains invalid JSON → non-nil error.

💡 Suggested additions
func TestLoadUsageActivitySummary_FallbackPath(t *testing.T) {
    t.Parallel()
    runDir := t.TempDir()
    // Write to the legacy fallback location.
    fallbackPath := filepath.Join(runDir, "activity", "summary.json")
    require.NoError(t, os.MkdirAll(filepath.Dir(fallbackPath), 0o755))
    require.NoError(t, os.WriteFile(fallbackPath, []byte(`{"session":{"turns":3}}`), 0o644))
    summary, err := loadUsageActivitySummary(runDir)
    require.NoError(t, err)
    require.NotNil(t, summary)
    assert.Equal(t, 3, summary.Session.Turns)
}

func TestLoadUsageActivitySummary_NoFile(t *testing.T) {
    t.Parallel()
    summary, err := loadUsageActivitySummary(t.TempDir())
    require.NoError(t, err)
    assert.Nil(t, summary)
}

func TestLoadUsageActivitySummary_MalformedJSON(t *testing.T) {
    t.Parallel()
    runDir := t.TempDir()
    p := filepath.Join(runDir, "usage", "activity", "summary.json")
    require.NoError(t, os.MkdirAll(filepath.Dir(p), 0o755))
    require.NoError(t, os.WriteFile(p, []byte(`{not valid json`), 0o644))
    _, err := loadUsageActivitySummary(runDir)
    require.Error(t, err)
}

Comment thread pkg/cli/logs_usage_activity.go Outdated

// Preserve previously parsed turn counts (from full session artifacts/events.jsonl)
// and only backfill when they are missing.
if summary.Session != nil && result.Run.Turns == 0 && summary.Session.Turns > 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] The condition result.Run.Turns == 0 cannot distinguish between "full artifacts downloaded, zero turns genuinely observed" and "turns field not yet populated". A run that crashes before the first user message would have Turns == 0 from full-artifact parsing, and the backfill would silently overwrite that with the precomputed value from the usage summary.

💡 Options to consider

The simplest fix is to use a sentinel that means "no full-artifact data available" instead of relying on the zero value. For example:

  • Set result.Run.Turns = -1 (or use a separate bool turnsPopulated flag) during full-artifact processing to signal "we parsed the artifacts and found zero turns". Then backfill only when the field is still at its zero-value default.
  • Alternatively, document explicitly in a comment on this line that a zero-turn run IS safe to backfill (e.g., "if full-artifact parsing produced 0, the precomputed value is equally authoritative"), so the behaviour is a conscious choice rather than an accidental side effect.

)

type usageActivitySummary struct {
Schema string `json:"schema,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] The Schema field is unmarshalled but never inspected. If the producer emits a usage-activity-summary/v2 with different field semantics, this consumer will silently accept and misinterpret it.

💡 Suggested guard

Add a schema version check after parsing:

const usageActivitySummarySchema = "usage-activity-summary/v1"

// in loadUsageActivitySummary, after json.Unmarshal:
if summary.Schema != "" && summary.Schema != usageActivitySummarySchema {
    return nil, fmt.Errorf("unsupported usage activity summary schema %q (expected %q)", summary.Schema, usageActivitySummarySchema)
}

Since the schema is producer-controlled by this repo's conclusion job, a hard error is safe here — if the constant falls out of sync it will surface immediately in tests rather than silently producing wrong activity counts.

fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("Failed to analyze token usage for run %d: %v", run.DatabaseID, tokenErr)))
}
// token-usage.jsonl is also available in the compact usage artifact.
tokenUsage, tokenErr := analyzeTokenUsage(runOutputDir, verbose)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Removing the hasFirewallArtifact gate means analyzeTokenUsage is now attempted for every run. For usage-only downloads that lack token-usage.jsonl, the error is silently swallowed unless --verbose is active. This means a systematic issue (e.g., a format change or missing file in every run's usage artifact) produces no observable signal in normal operation.

💡 Options
  • Emit the warning unconditionally (not gated on verbose) but at a lower-noise level (e.g., only once if the error repeats across many runs).
  • Alternatively, accept the current behaviour and add a comment here explaining that token-usage parsing is best-effort for usage-only downloads and a missing file is expected rather than exceptional.

Comment thread pkg/workflow/notify_comment.go Outdated
" [ -f /tmp/gh-aw/usage/agent/token_usage.jsonl ] || : > /tmp/gh-aw/usage/agent/token_usage.jsonl\n",
" [ -f /tmp/gh-aw/usage/detection/token_usage.jsonl ] || : > /tmp/gh-aw/usage/detection/token_usage.jsonl\n",
" mkdir -p /tmp/gh-aw/usage/activity\n",
" python3 - <<'PY'\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] This adds ~174 lines of aggregation logic (firewall log parsing, session event counting, gateway JSONL aggregation) as inline Go string literals with no way to unit test the Python. A bug in the firewall column-index assumptions or event-type matching would silently produce wrong activity summaries in every future artifact without any test catching it.

💡 Suggestions

The PR description notes the script is intentionally inline to keep compiled workflows self-contained. That is a valid constraint. Given that, consider:

  1. Extract to an embedded Go file (//go:embed scripts/activity_summary.py) so the same script can be exercised by a go test subprocess test that writes known log fixtures and asserts on the JSON output — without changing the compiled workflow content.
  2. If inline is mandatory, add a TestBuildUsageArtifactUploadSteps_PythonScript test that executes the extracted Python string against realistic fixture files (squid log, events.jsonl, gateway.jsonl) and asserts on the resulting summary.json. This is feasible since Python 3 is available in CI.
  3. At minimum, the test in notify_comment_test.go should assert on more than presence of python3 - <<'PY' — e.g., verify the SQUID_STATUS_INDEX constant and the usage-activity-summary/v1 schema marker are present.

Comment thread pkg/workflow/notify_comment.go Outdated
" allowed = code in (200, 206, 304)\n",
" except ValueError:\n",
" allowed = False\n",
" if not allowed and any(marker in decision for marker in ('TCP_TUNNEL', 'TCP_HIT', 'TCP_MISS')):\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The secondary TCP_TUNNEL / TCP_HIT / TCP_MISS check fires for any request where the HTTP status code did not parse as 200 / 206 / 304, including valid error responses (403, 407). If a blocked HTTPS CONNECT produces 403 in the status column alongside TCP_TUNNEL in the decision column, this would miscount the request as allowed.

💡 Context and suggestion

In standard Squid behaviour, TCP_TUNNEL only appears when the CONNECT tunnel is successfully established, so a 403 + TCP_TUNNEL combination would be atypical. However, given this is a custom log format, it's worth adding a brief comment explaining:

  • What exact values parts[SQUID_STATUS_INDEX] and parts[SQUID_DECISION_INDEX] can take in this project's log format.
  • Whether TCP_TUNNEL can co-occur with non-200 statuses in practice.

This makes the intent explicit and reduces the risk of a future log-format tweak silently changing the allowed/blocked counts.

// distinguishes overall request volume (RequestCount) from tool-invocation
// volume (ToolCallCount). In usage-aggregate mode we only have per-server
// tool-call counts, so both fields are populated from that single source.
RequestCount: server.ToolCallCount,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] RequestCount and ToolCallCount are semantically distinct fields in MCPServerStats (request volume vs. tool-invocation volume), but both are set to the same server.ToolCallCount value here. Downstream code or reports that compute a requests_per_tool_call ratio — or display RequestCount separately from ToolCallCount — will always show 1.0 / identical values for any backfilled row, masking real usage patterns.

💡 Suggestion

Either:

  • Leave RequestCount at zero (its zero value) and set only ToolCallCount, making it clear that request-level data is unavailable from usage aggregates.
  • Or rename the comment to explicitly state "usage aggregates only track tool invocations, not HTTP request volume; RequestCount is an approximation" so readers of downstream reports understand the limitation.

The existing comment explains the intent, but it's easy to miss. A zero value with a named sentinel (or a bool requestCountAvailable flag on MCPServerStats) would make the absence of data self-documenting.

--repo "${{ github.repository }}" \
--start-date -1w \
--count 100 \
--count 500 \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] The 5× increase from --count 100 to --count 500 is well-motivated (lighter artifact downloads), but there is no documentation of what ceiling was considered or what constraint the number 500 is tracking against. If the window needs to grow again, a future developer has no guidance on the limiting factor.

💡 Suggestion

Add a brief inline comment explaining the constraint, for example:

// 500 runs: fits within the GitHub Actions usage download budget for usage-only artifacts;
// raise further if the artifact footprint is reduced again.
"            --count 500 \\\n",

Alternatively, note in the PR description or a code comment what the expected download time / API call budget is at 500 vs. 100.

// These aggregates are generated by the conclusion job and allow lightweight
// usage-only downloads to include firewall/session summaries.
usageActivitySummary, usageActivityErr := loadUsageActivitySummary(runOutputDir)
if usageActivityErr != nil && verbose {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Non-ENOENT errors from loadUsageActivitySummary (e.g., permission denied, storage corruption) are only surfaced when --verbose is active. A systematic read failure affecting every run would be completely invisible in normal operation — not counted in any metric, not printed, not aggregated.

💡 Suggestion

Consider emitting this warning unconditionally (or at least incrementing a skipped-run counter), since a non-ENOENT IO error is genuinely unexpected and should not require --verbose to diagnose. The warning is already low-noise (one line per run) and distinguishes the missing-file case (which loadUsageActivitySummary returns nil, nil for silently) from the real-error case.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: non-blocking observations

The overall design is sound — precomputing activity aggregates into the usage artifact is the right approach for scaling the maintenance activity report without large artifact downloads. Three data-quality issues are worth addressing before this pattern proliferates.

📋 Issues found (3 medium/low, no blocking)

Medium — fallback chain broken on corrupt JSON (logs_usage_activity.go:63)
A malformed usage/activity/summary.json (partially written artifact) causes an immediate error return; the activity/summary.json fallback is never tried. os.IsNotExist falls through correctly — JSON parse failure should behave the same.

Medium — gateway double-counting risk (notify_comment.go:824)
mcp-logs/gateway.jsonl and gateway.jsonl within the same sandbox are iterated unconditionally. Both can exist during a log-path migration; when they do, every tool call is counted twice in the summary, inflating the MCPToolUsage.Servers data backfilled by applyUsageActivitySummaryToResult.

Low — TCP decision substring match over-counts allowed requests (notify_comment.go:754)
'TCP_MISS' in decision matches TCP_MISS_ABORTED, TCP_MISS/503, etc. The intent is to check the base decision token; splitting on / first and using in (...) would prevent compound codes from being misclassified as allowed.

🔎 Code quality review by PR Code Quality Reviewer · 166.1 AIC · ⌖ 7.67 AIC · ⊞ 5.1K

Comment thread pkg/cli/logs_usage_activity.go Outdated
}
var summary usageActivitySummary
if err := json.Unmarshal(raw, &summary); err != nil {
return nil, fmt.Errorf("parse usage activity summary %s: %w", cleanPath, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON parse failure on the primary candidate short-circuits the fallback path: if usage/activity/summary.json exists but contains malformed JSON (e.g. partially written during artifact upload), the function immediately returns an error and never tries the activity/summary.json fallback — so applyUsageActivitySummaryToResult no-ops silently.

💡 Suggested fix

Fall through on unmarshal failure the same way os.IsNotExist falls through:

var summary usageActivitySummary
if err := json.Unmarshal(raw, &summary); err != nil {
    continue  // try next candidate
}
return &summary, nil

Or collect the error and return it only after exhausting all candidates:

var lastErr error
for _, candidate := range candidates {
    raw, err := os.ReadFile(cleanPath)
    if err != nil {
        if os.IsNotExist(err) { continue }
        return nil, err
    }
    var s usageActivitySummary
    if err := json.Unmarshal(raw, &s); err != nil {
        lastErr = err
        continue
    }
    return &s, nil
}
return nil, lastErr

A partially-written JSON file from an interrupted artifact upload should behave the same as a missing file for the purposes of the fallback chain.

Comment thread pkg/workflow/notify_comment.go Outdated
" gateway = {'total_calls': 0, 'failed_calls': 0, 'servers': {}}\n",
" gateway_paths = [\n",
" '/tmp/gh-aw/sandbox/agent/logs/mcp-logs/gateway.jsonl',\n",
" '/tmp/gh-aw/sandbox/agent/logs/gateway.jsonl',\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateway call double-counting when both log paths coexist: mcp-logs/gateway.jsonl and gateway.jsonl within the same sandbox are iterated unconditionally. If both files exist for the same run (migration window, duplicate writers), every tool call is counted twice — inflating total_calls, per-server tool_call_count, and the derived MCPToolUsage.Servers that applyUsageActivitySummaryToResult injects into reports.

💡 Suggested fix

Process only the first path that exists within each sandbox pair, stopping once a file is found:

# Try modern path first; fall back to legacy path within each sandbox
sandbox_gateway_pairs = [
    (
        '/tmp/gh-aw/sandbox/agent/logs/mcp-logs/gateway.jsonl',
        '/tmp/gh-aw/sandbox/agent/logs/gateway.jsonl',
    ),
    (
        '/tmp/gh-aw/threat-detection/sandbox/agent/logs/mcp-logs/gateway.jsonl',
        '/tmp/gh-aw/threat-detection/sandbox/agent/logs/gateway.jsonl',
    ),
]
gateway_paths = []
for modern, legacy in sandbox_gateway_pairs:
    if os.path.exists(modern):
        gateway_paths.append(modern)
    elif os.path.exists(legacy):
        gateway_paths.append(legacy)

Alternatively, if both files can legitimately hold different events, track seen lines by content hash to deduplicate. The current approach trusts that only one file exists per sandbox, which may not hold during log-path migrations.

Comment thread pkg/workflow/notify_comment.go Outdated
" allowed = code in (200, 206, 304)\n",
" except ValueError:\n",
" allowed = False\n",
" if not allowed and any(marker in decision for marker in ('TCP_TUNNEL', 'TCP_HIT', 'TCP_MISS')):\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCP decision marker uses substring matching: 'TCP_MISS' in decision will match compound Squid codes like TCP_MISS_ABORTED or TCP_MISS/503, classifying a failed-or-aborted request as allowed. The same applies to TCP_HIT matching TCP_HIT_REFRESH_FAIL.

💡 Suggested fix

Match on the exact token (prefix before / or end of string) rather than a substring:

def is_allowed_decision(decision: str) -> bool:
    # Decision field is typically "TCP_MISS/200" or "TCP_TUNNEL/200"
    base = decision.split('/')[0]  # strip HTTP status suffix
    return base in ('TCP_TUNNEL', 'TCP_HIT', 'TCP_MISS')

if not allowed and is_allowed_decision(decision):
    allowed = True

This is a monitoring metric so the impact is limited to inaccurate counters, but TCP_MISS_ABORTED appearing as "allowed" could mask elevated block rates in environments with connection failures.

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot make sure GitHub api rate limit data is also included.

See issue report: #40505

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

⚠️ Security scanning failed for Smoke Codex. Review the logs for details.

@github-actions

Copy link
Copy Markdown
Contributor

📰 BREAKING: Smoke Copilot - AOAI (apikey) is now investigating this pull request. Sources say the story is developing...

@github-actions

Copy link
Copy Markdown
Contributor

📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing...

@github-actions

Copy link
Copy Markdown
Contributor

📰 BREAKING: Smoke Copilot - AOAI (Entra) is now investigating this pull request. Sources say the story is developing...

@github-actions

Copy link
Copy Markdown
Contributor

Smoke test FAIL

  • ✅ gh CLI
  • ❌ web-fetch
  • ✅ file write
  • ✅ bash verify
  • ❌ build

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

🥧 Smoke Pi — Powered by Pi · 4.32 AIC · ⊞ 1.8K ·

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results

    1. GitHub MCP Testing: ✅
    1. Web Fetch Testing: ✅
    1. File Writing Testing: ✅
    1. Bash Tool Testing: ✅
    1. Build gh-aw: ❌

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

Smoke Gemini — Powered by Gemini ·

@github-actions

Copy link
Copy Markdown
Contributor

Agent Container Tool Check

Tool Status Version
bash 5.2.21
sh available
git 2.54.0
jq 1.7
yq v4.53.3
curl 8.5.0
gh 2.94.0
node v22.22.3
python3 3.11.15 (PyPy 7.3.23)
go 1.24.13
java 21.0.11
dotnet 10.0.301

Result: 12/12 tools available ✅

Overall Status: PASS

🔧 Tool validation by Agent Container Smoke Test · 16.9 AIC · ⊞ 5.4K ·

@github-actions

Copy link
Copy Markdown
Contributor

Commit pushed: 45ffade

Generated by Changeset Generator · 11 AIC · ⊞ 16K

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Codex - 27892632179
#40552 q: load token optimization skill when user mentions optimize/cost/improve
#40550 Smoke Copilot: require dispatch_workflow.inputs.message for haiku-printer
Status: FAIL
GitHub MCP ✅ Serena ✅ Playwright ✅ Web Fetch ❌ File Write ✅ Bash ✅ Build ✅ Comment Memory ✅ Cache Memory ✅ Issue Field ✅

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex · 6.77 AIC · ⊞ 13.9K ·

@github-actions

Copy link
Copy Markdown
Contributor

Comment Memory

Silent caches breathe
A fresh run writes down its own name
Night keeps every trace

Note

This comment is managed by comment memory.

It stores persistent context for this thread in the code block at the top of this comment.
Edit only the text inside the backtick fences; workflow metadata and the footer are regenerated automatically.

Learn more about comment memory

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex · 6.77 AIC · ⊞ 13.9K ·

@github-actions

Copy link
Copy Markdown
Contributor

💥 Smoke Test: Claude — Run 27892632153

Core #1-12: ✅ all passed
PR Review: #13#14#15#16#17#18#19 ⚠️ skipped

Overall: PASS (one optional close-PR test skipped)

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 101.2 AIC · ⊞ 8.5K ·

@github-actions

Copy link
Copy Markdown
Contributor

PR: Precompute usage-artifact activity aggregates and include GitHub API rate limit data for usage-only reporting
1✅ 2✅ 3✅ 4✅ 5✅ 6✅ 7✅ 8✅
9✅ 10✅ 11✅ 12✅ 13✅ 14✅ 15✅
Overall: PASS
Author: app/copilot-swe-agent | Assignees: pelikhan, Copilot

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 459.8 AIC · ⊞ 19.9K ·

@github-actions

Copy link
Copy Markdown
Contributor

Comment Memory

Stone tests tap all night
Robot drums on code cave wall
Dawn says green, move on

Note

This comment is managed by comment memory.

It stores persistent context for this thread in the code block at the top of this comment.
Edit only the text inside the backtick fences; workflow metadata and the footer are regenerated automatically.

Learn more about comment memory

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 459.8 AIC · ⊞ 19.9K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥 Automated smoke test review - all systems nominal!

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 101.2 AIC · ⊞ 8.5K

@@ -0,0 +1,40 @@
# ADR-40504: Precompute Usage-Artifact Activity Aggregates for Lightweight Logs Reporting

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear ADR title that captures the precompute-aggregates intent. 👍

## Decision

We will have the conclusion job precompute run-activity aggregates into the `usage` artifact at `usage/activity/summary.json`, tagged with the schema marker `usage-activity-summary/v1`. The aggregation runs as an inline `python3` step embedded in the generated workflow (so compiled workflows stay self-contained with no extra repository-file dependency) and rolls up firewall request counts, Copilot session event counters, and gateway tool-call counters with a per-server breakdown. The logs pipeline (`pkg/cli/logs_usage_activity.go`) loads this summary and backfills only fields that are still empty after normal artifact parsing, keeping precedence explicit: detailed artifacts win, the usage summary fills gaps only.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider noting how the v1 schema marker will be migrated if aggregate fields change in future.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke bot looked. Notes left on two lines.

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 459.8 AIC · ⊞ 19.9K

@pelikhan

Copy link
Copy Markdown
Collaborator

Node.js 24's builtin globSync from node:fs doesn't support the nodir option. Removed it since the patterns (*.log, *.jsonl) already match only files, not directories. This fixes the TypeScript typecheck errors.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

@copilot fix https://github.com/github/gh-aw/actions/runs/27892847445/job/82541346469 and typecheck

Fixed in 45ffade. The issue was that Node.js 24's builtin globSync from node:fs doesn't support the nodir option (unlike the external glob package). Removed the option since the patterns (*.log, *.jsonl) already match only files, not directories. Typecheck now passes.

@pelikhan pelikhan merged commit 9f631ce into main Jun 21, 2026
37 checks passed
@pelikhan pelikhan deleted the copilot/update-conclusion-job-aggregate-data branch June 21, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants