diff --git a/docs/src/content/docs/patterns/monitoring.md b/docs/src/content/docs/patterns/monitoring.md index 0b5fb2b5c87..3717203e101 100644 --- a/docs/src/content/docs/patterns/monitoring.md +++ b/docs/src/content/docs/patterns/monitoring.md @@ -109,7 +109,26 @@ See the full reference: [/reference/safe-outputs/#no-op-logging-noop](/gh-aw/ref ## Operational monitoring -- Use `gh aw status` to see which workflows are enabled and their latest run state. -- Use `gh aw logs` and `gh aw audit` to inspect tool usage, errors, MCP failures, and network patterns. +Use `gh aw status` to see which workflows are enabled and their latest run state. -See: [/setup/cli/](/gh-aw/setup/cli/) +For deeper investigation, the audit commands are the primary monitoring tool for agentic workflows: + +- `gh aw audit ` — single-run report with tool usage, MCP failures, firewall activity, and cost metrics +- `gh aw audit diff ` — compare two runs to detect behavioral regressions or new network accesses +- `gh aw logs --format markdown [workflow]` — cross-run security and performance report for trend monitoring + +```bash +# Audit the most recent run +gh aw audit 12345678 + +# Compare two runs for regressions +gh aw audit diff 12345678 12345679 + +# Trend report across the last 10 runs of a workflow +gh aw logs my-workflow --format markdown --count 10 +``` + +> [!TIP] +> Use `gh aw logs --format markdown` inside a scheduled workflow agent to automate trend monitoring and surface cost or security regressions without manual intervention. + +See [Audit Commands](/gh-aw/reference/audit/) for full flag documentation, and [CLI Reference](/gh-aw/setup/cli/) for all available commands. diff --git a/docs/src/content/docs/reference/cost-management.md b/docs/src/content/docs/reference/cost-management.md index cfcc5347b96..93372063cce 100644 --- a/docs/src/content/docs/reference/cost-management.md +++ b/docs/src/content/docs/reference/cost-management.md @@ -35,6 +35,8 @@ The agent job invokes an AI engine (Copilot, Claude, Codex, or a custom engine) The `gh aw logs` command downloads workflow run data and surfaces per-run metrics including elapsed duration, token usage, and estimated inference cost. Use it to see exactly what your workflows are consuming before deciding what to optimize. +For a deep dive into a single run's token usage, tool calls, and inference spend, use `gh aw audit `. The **Metrics** and **Performance Metrics** sections of the audit report show token counts, effective tokens, turn counts, and estimated cost in one place — useful for diagnosing why a specific run was expensive. For cost trends across multiple runs, use `gh aw logs --format markdown [workflow]` to generate a cross-run report with metrics trends and anomaly detection. + ### View recent run durations ```bash diff --git a/docs/src/content/docs/reference/glossary.md b/docs/src/content/docs/reference/glossary.md index 19e3a275aec..a500d725530 100644 --- a/docs/src/content/docs/reference/glossary.md +++ b/docs/src/content/docs/reference/glossary.md @@ -357,13 +357,25 @@ The `gh aw` extension for GitHub CLI providing commands for managing agentic wor An interactive web-based editor for authoring, compiling, and previewing agentic workflows without local installation. The Playground runs the gh-aw compiler in the browser using [WebAssembly](#webassembly-wasm) and auto-saves editor content to `localStorage` so work is preserved across sessions. Available at `/gh-aw/editor/`. +### Audit (`gh aw audit`) + +A CLI command that downloads workflow run artifacts and logs, analyzes MCP tool usage and network behavior, and generates a structured Markdown or JSON report. The report covers failure analysis, tool usage, MCP server status, firewall activity, token/cost metrics, behavior fingerprint, and safe-output summary. Accepts a numeric run ID or any GitHub Actions run or job URL. See [Audit Commands](/gh-aw/reference/audit/). + ### Audit Diff (`gh aw audit diff`) -A `gh aw audit` subcommand that compares firewall behavior across two workflow runs. Reports domain additions and removals, allowed/denied status changes, request volume drift, and anomaly flags. Outputs results in pretty, markdown, or JSON format. Useful for spotting regressions and behavioral drift between runs. See [CLI Reference](/gh-aw/setup/cli/#audit-diff). +A `gh aw audit` subcommand that compares behavior across two workflow runs across firewall, MCP tool usage, and run metrics dimensions. Reports domain additions and removals, allowed/denied status changes, request volume drift, and anomaly flags. Useful for detecting regressions and behavioral drift between runs. See [Audit Commands](/gh-aw/reference/audit/#gh-aw-audit-diff-run-id-1-run-id-2). + +### Behavior Fingerprint + +A multi-dimensional characterization of a single workflow run produced by `gh aw audit`. Captures the task domain, network access patterns, tool usage profile, token consumption, and agentic assessments in a compact summary. Two runs with the same fingerprint exhibit identical observable behavior; diverging fingerprints signal regressions or unexpected changes. See [Audit Commands](/gh-aw/reference/audit/). ### Cross-Run Audit Report (`gh aw logs --format`) -A feature of `gh aw logs` that aggregates firewall data across multiple workflow runs to produce a cross-run security report. The report includes an executive summary, domain inventory, and per-run breakdown. Designed for security reviews, compliance checks, and feeding debugging or optimization agents. Outputs markdown by default (suitable for `$GITHUB_STEP_SUMMARY`), or pretty/JSON format. See [CLI Reference](/gh-aw/setup/cli/#logs). +A feature of `gh aw logs` that aggregates firewall, MCP, and metrics data across multiple workflow runs to produce a security and performance report. Includes an executive summary, domain inventory, and per-run breakdown with anomaly detection. Designed for security reviews, compliance checks, and feeding optimization agents. See [Audit Commands](/gh-aw/reference/audit/#gh-aw-logs-format-fmt). + +### Firewall Analysis + +A section of the `gh aw audit` report that breaks down all network requests made during a workflow run — showing allowed domains, denied domains, request volumes, and policy attribution. Derived from AWF firewall logs. Use `gh aw audit diff` to compare firewall behavior across runs and identify new or removed domain accesses. See [Audit Commands](/gh-aw/reference/audit/) and [Network Permissions](/gh-aw/reference/network/). ### Frontmatter Hash diff --git a/docs/src/content/docs/reference/mcp-gateway.md b/docs/src/content/docs/reference/mcp-gateway.md index cdb9c9160c2..420adc6a35f 100644 --- a/docs/src/content/docs/reference/mcp-gateway.md +++ b/docs/src/content/docs/reference/mcp-gateway.md @@ -1203,6 +1203,9 @@ The gateway SHOULD: 4. Include health status in `/health` response 5. Update readiness based on critical server status +> [!TIP] +> To inspect MCP server health for a specific workflow run at runtime, use `gh aw audit `. The **MCP Server Health** section of the audit report shows connection failures, timeout errors, tool call counts, and error rates per server — providing a post-run view of gateway behavior. For recurring MCP failures, `gh aw audit diff` compares MCP tool usage between two runs to identify regressions. See [Audit Commands](/gh-aw/reference/audit/). + --- ## 9. Error Handling @@ -1663,6 +1666,7 @@ Content-Type: application/json - **[MCP-Config]** MCP Configuration Format - **[HTTP/1.1]** Hypertext Transfer Protocol -- HTTP/1.1 +- **[gh-aw-audit]** [Audit Commands Reference](/gh-aw/reference/audit/) — Runtime MCP server health, guard policy analysis, and cross-run debugging --- diff --git a/docs/src/content/docs/reference/network.md b/docs/src/content/docs/reference/network.md index 1b835730579..09d2330e527 100644 --- a/docs/src/content/docs/reference/network.md +++ b/docs/src/content/docs/reference/network.md @@ -290,10 +290,23 @@ If you encounter network access blocked errors, verify that required domains or Use `gh aw logs --run-id ` to view firewall activity and identify blocked domains. See the [Network Configuration Guide](/gh-aw/guides/network-configuration/#troubleshooting-firewall-blocking) for detailed troubleshooting steps and common solutions. +To understand domain allow/block behavior in detail, use `gh aw audit ` — the **Firewall Analysis** section of the report lists every domain request, its allowed or denied status, request volume, and policy attribution. To compare firewall behavior between two runs and spot new or removed domain accesses, use `gh aw audit diff`: + +```bash +# Inspect firewall activity for a single run +gh aw audit 12345678 + +# Compare firewall behavior between two runs +gh aw audit diff 12345678 12345679 +``` + +See [Audit Commands](/gh-aw/reference/audit/) for full documentation. + ## Related Documentation - [Network Configuration Guide](/gh-aw/guides/network-configuration/) - Practical examples and common patterns - [Frontmatter](/gh-aw/reference/frontmatter/) - Complete frontmatter configuration guide - [Tools](/gh-aw/reference/tools/) - Tool-specific network access configuration - [Playwright](/gh-aw/reference/playwright/) - Browser automation and network requirements +- [Audit Commands](/gh-aw/reference/audit/) - Firewall analysis and cross-run diff for understanding domain allow/block behavior - [Security Guide](/gh-aw/introduction/architecture/) - Comprehensive security guidance diff --git a/docs/src/content/docs/troubleshooting/debugging.md b/docs/src/content/docs/troubleshooting/debugging.md index 5401e569f05..c0fb102bfcc 100644 --- a/docs/src/content/docs/troubleshooting/debugging.md +++ b/docs/src/content/docs/troubleshooting/debugging.md @@ -76,7 +76,7 @@ The agent will install `gh aw`, analyze logs, identify the root cause, and sugge ### Auditing a Specific Run -`gh aw audit` gives a comprehensive breakdown of a single run — overview, metrics, tool usage, MCP failures, firewall analysis, and artifacts: +`gh aw audit` gives a comprehensive breakdown of a single run — overview, metrics, tool usage, MCP failures, firewall analysis, behavior fingerprint, and artifacts: ```bash # By run ID @@ -98,22 +98,25 @@ gh aw audit 12345678 --parse Audit output includes: - **Failure analysis** with error summary and root cause +- **Behavior fingerprint** — multi-dimensional characterization of the run's network, tool, and cost profile - **Tool usage** — which tools were called, which failed, and why -- **MCP server status** — connection failures, timeout errors -- **Firewall analysis** — blocked domains and allowed traffic +- **MCP server status** — connection failures, timeout errors, and per-server health +- **Firewall analysis** — blocked domains, allowed traffic, and policy attribution +- **Token/cost metrics** — per-run inference spend and token usage - **Safe-outputs** — structured outputs the agent produced -To compare behavior between two runs and detect regressions, use `audit diff`: +To compare behavior between two runs and detect regressions across firewall, MCP, and metrics dimensions, use `audit diff`: ```bash gh aw audit diff 12345678 12345679 gh aw audit diff 12345678 12345679 --format markdown ``` -For trends across multiple runs, use `audit report`: +For security and performance trends across multiple runs, use `gh aw logs --format`: ```bash -gh aw audit report --workflow "my-workflow" --last 10 +gh aw logs my-workflow --format markdown --count 10 +gh aw logs my-workflow --format markdown --last 5 --json ``` See [Audit Commands](/gh-aw/reference/audit/) for complete flag documentation.