Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 9 additions & 10 deletions .github/aw/syntax-tools-imports.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,9 @@ cache:
- `fail-on-cache-miss:` - Fail if cache not found (boolean)
- `lookup-only:` - Only check cache existence (boolean)

Cache steps are automatically added to the workflow job and the cache configuration is removed from the final `.lock.yml` file.
Cache steps are auto-added to the workflow job; cache config is removed from the final `.lock.yml`.


> **Memory configuration**: For detailed documentation on `cache-memory:`, `repo-memory:`, and `comment-memory:` configuration including advanced options and use cases, see [memory.md](memory.md).
> **Memory configuration**: For `cache-memory:`, `repo-memory:`, and `comment-memory:`, see [memory.md](memory.md).


## Tool Configuration
Expand Down Expand Up @@ -107,7 +106,7 @@ mcp-servers:

### Engine Network Permissions

Control network access for AI engines using the top-level `network:` field. If no `network:` permission is specified, it defaults to `network: defaults` which provides access to basic infrastructure only.
Control network access via the top-level `network:` field. Defaults to `network: defaults` (basic infrastructure only) if unspecified.

```yaml
engine:
Expand Down Expand Up @@ -165,7 +164,7 @@ network: {}

**Available Ecosystem Identifiers:**

Each ecosystem identifier enables network access to the domains required by that language's package manager and toolchain. When writing workflows that involve package management, builds, or tests, **always include the ecosystem identifier matching the repository's primary language** in addition to `defaults`.
Each identifier enables network access to that language's package manager domains. For workflows with package management, builds, or tests, **always include the ecosystem matching the repository's primary language** plus `defaults`.

| Identifier | Runtimes / Languages | Package Manager / Domains |
|---|---|---|
Expand Down Expand Up @@ -325,7 +324,7 @@ In the compiled workflow, the order is: copilot-setup-steps → imported steps f

## Permission Patterns

**IMPORTANT**: Agentic workflows should NOT include write permissions (`issues: write`, `pull-requests: write`, `contents: write`). The safe-outputs system provides these capabilities through separate, secured jobs with appropriate permissions. NO write permissions should be granted to the main AI processing job, it will only cause a later compilation error.
**IMPORTANT**: Agentic workflows MUST NOT include write permissions (`issues: write`, `pull-requests: write`, `contents: write`). Safe-outputs provide these via separate secured jobs. Granting writes to the main AI job causes a compilation error.

### Read-Only Pattern

Expand All @@ -350,7 +349,7 @@ safe-outputs:

**Key Benefits of Safe-Outputs:**

- **Security**: Main job runs with minimal permissions
- **Separation of Concerns**: Write operations are handled by dedicated jobs
- **Permission Management**: Safe-outputs jobs automatically receive required permissions
- **Audit Trail**: Clear separation between AI processing and GitHub API interactions
- Main job runs with minimal permissions
- Write operations handled by dedicated jobs
- Safe-outputs jobs auto-receive required permissions
- Clear audit trail between AI processing and GitHub API
2 changes: 1 addition & 1 deletion .github/aw/test-coverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Consult this file when creating or updating a workflow that analyzes test covera

## Core Principle: Read Artifacts First

**Always prefer fetching pre-computed coverage artifacts from CI over re-running the test suite.** Re-running tests is slow and duplicates work CI has already done.
Always prefer fetching pre-computed coverage artifacts from CI over re-running tests. Re-running duplicates CI work.

## Coverage Data Strategy

Expand Down
60 changes: 27 additions & 33 deletions .github/aw/token-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,32 +30,28 @@ Apply these in order — each check can halve costs:

## Frontier-Model Cost Pattern

Using a more capable frontier model can reduce **total** cost when the workflow architecture prevents unnecessary invocations and keeps expensive context narrow.
A frontier model can reduce **total** cost when architecture prevents unnecessary invocations and keeps expensive context narrow.

Guidance:
- use frontier model for planning, hypothesis selection, synthesis, ambiguous decisions, final judgment
- do not spend frontier turns on repetitive extraction, duplicate detection, or broad first-pass scanning
- add a cheap triage stage for known/duplicate/stale/low-value events; stop with `noop` when escalation is unnecessary
- escalate to frontier model only when triage is uncertain or the case is genuinely new/high-value
- cap sub-agent fan-out so escalations cannot recurse without bound

- use the frontier model for planning, hypothesis selection, synthesis, ambiguous decisions, and final judgment
- do not spend frontier-model turns on repetitive extraction, duplicate detection, or broad first-pass scanning
- add a cheap triage stage for known/duplicate/stale/low-value events and stop with `noop` or another safe output when escalation is unnecessary
- escalate to the frontier model only when triage is uncertain or the case is genuinely new/high-value
- cap sub-agent fan-out so escalations cannot recurse or expand without bound

Do **not** claim that frontier models are always cheaper. Cost wins come from architecture and selective execution, not model tier alone.
Cost wins come from architecture and selective execution, not model tier alone.

---

## Pull Context, Do Not Push Context

Avoid front-loading large raw context into the initial prompt when data can be fetched on demand.

Prefer:
Avoid front-loading large raw context when data can be fetched on demand. Prefer:

- deterministic pre-steps that materialize compact files under `/tmp/gh-aw/`
- `gh` + filtering commands (`jq`, `grep`, focused selectors) before context is exposed to the model
- query interfaces and pre-aggregated summaries instead of full API payloads
- pre-aggregated summaries instead of full API payloads
- directed tool calls issued after the agent forms a hypothesis

Warning on anchoring: if authors preselect raw logs or large payloads too early, the model may over-focus on that material and miss the actual cause elsewhere.
Anchoring warning: preselecting raw logs too early can make the model over-focus and miss the actual cause.

---

Expand Down Expand Up @@ -102,7 +98,7 @@ Each line is one API call with `model`, `input_tokens`, `output_tokens`, `cache_

## Technique 1 — DataOps: Move Compute to Steps

**The single biggest optimization.** Replace agentic data fetching with deterministic shell commands in `steps:`. Shell steps run outside the AI sandbox (no tokens) and produce structured output the agent reads directly.
The single biggest optimization. Replace agentic data fetching with deterministic shell commands in `steps:`. Shell steps run outside the AI sandbox (no tokens) and produce structured output the agent reads directly.

### Before (agent does all the work)

Expand Down Expand Up @@ -157,12 +153,12 @@ Read the pre-computed stats at `/tmp/gh-aw/data/stats.json` and `/tmp/gh-aw/data
Create a concise weekly PR summary discussion.
```

Shell steps run outside the AI sandbox (zero tokens); the agent only reads compact aggregated JSON.
Shell steps run outside the AI sandbox (zero tokens); the agent reads compact aggregated JSON.

**Best practices:**

- Write one JSON file per data source; use `jq` to pre-aggregate
- Store files under `/tmp/gh-aw/` — this directory is available to the agent
- One JSON file per data source; `jq` to pre-aggregate
- Store files under `/tmp/gh-aw/`
- Document file locations and schema in the prompt body so the agent doesn't need to explore

See also: [DataOps pattern docs](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/patterns/data-ops.md)
Expand All @@ -180,7 +176,7 @@ tools:
toolsets: [default]
```

The agent reads GitHub data with `gh issue list`, `gh pr view`, etc., and can pipe through `jq` before the data enters context. The alternative `mode: local` starts a Docker-based MCP server with startup latency and verbose tool results.
Agent reads GitHub via `gh issue list`, `gh pr view`, etc. and pipes through `jq` before data enters context. `mode: local` starts a Docker-based MCP server with startup latency and verbose tool results.

### `cli-proxy: true` (other MCP servers as CLIs)

Expand All @@ -195,7 +191,7 @@ tools:
...
```

With `cli-proxy`, the agent calls `my-custom-mcp <tool> <args>` from bash and can pipe output through `jq` or `grep` to extract only the fields it needs — instead of receiving the full MCP tool response in the conversation context.
With `cli-proxy`, the agent calls `my-custom-mcp <tool> <args>` from bash and pipes output through `jq`/`grep` to extract only needed fields — instead of receiving the full MCP tool response in context.

**Summary:**

Expand Down Expand Up @@ -263,17 +259,15 @@ Read the JSON file provided. Return only:
Nothing else.
```

**Why this saves tokens:** sub-agents run on the cheap `small` model; the main agent only reads compact `{"number":…, "category":…}` JSON; sub-agent dispatches can run in parallel.
**Why this saves tokens:** sub-agents run the cheap `small` model; main agent reads only compact `{"number":…, "category":…}` JSON; dispatches run in parallel.

### Pair sub-agents with sub-skills (progressive disclosure)

Use sub-skills as progressive disclosure for instruction-heavy tasks:

- Keep the main prompt short and plan-like (what to do, in what order).
- Put verbose instructions (report layout, rubric details, long formatting constraints) into `## skill:` blocks.
- Invoke those skills only at the moment they are needed (for example, when producing final output), so early planning/execution turns stay lean.
- Put verbose instructions (report layout, rubric details, formatting constraints) into `## skill:` blocks.
- Invoke skills only when needed (e.g., producing final output), so early turns stay lean.

This pattern lowers ambient context and usually improves both latency and AIC by delaying expensive instruction payloads until the final phase.
This delays expensive instruction payloads until the final phase, lowering ambient context.

**Sub-agent model aliases:**

Expand All @@ -291,7 +285,7 @@ See also: [Inline Sub-Agents](subagents.md)

## Technique 4 — Apply the Caveman Technique

Use an A/B experiment to compare a verbose prompt against a stripped-down minimal one. If the minimal variant produces equally useful output, adopt it permanently.
A/B compare a verbose prompt against a minimal one. Adopt minimal if quality holds.

```yaml
experiments:
Expand All @@ -306,7 +300,7 @@ List open issues by priority. Top 5 critical items. Be brief.
{{/if}}
```

Measure AI Credits (AIC) in each variant's run summary or via `gh aw audit`. If the `minimal` variant uses fewer AI Credits at acceptable quality, promote it as the baseline.
Measure AIC via run summary or `gh aw audit`. If `minimal` wins on cost at acceptable quality, promote as baseline.

---

Expand Down Expand Up @@ -356,17 +350,17 @@ See also: [A/B Testing Experiments](experiments.md)

## Technique 6 — Reduce Trigger Frequency and Batch Work

The cheapest run is the one you don't execute. If a workflow doesn't need near-real-time feedback, run it less often and batch items in one pass.
The cheapest run is the one you don't execute. If a workflow doesn't need near-real-time feedback, run it less often and batch.

### Prefer slower schedules when latency is acceptable

- `hourly` → `daily on weekdays` for team-facing summaries or audits
- `daily` → `weekly` for trend reports, optimization reviews, and backlog hygiene
- `every N hours` → a daily or weekly batch when the workflow only produces guidance or reports
- `daily` → `weekly` for trend reports, optimization reviews, backlog hygiene
- `every N hours` → daily/weekly batch when the workflow only produces guidance

### Prefer scheduled batches over reactive triggers

Reactive triggers (`issues:`, `pull_request:`, comment commands) suit immediate feedback. Otherwise prefer `schedule: daily on weekdays` and batch work. Typical batch-friendly tasks: triage summaries, stale backlog review, token audits, security digests. Combine with `cache-memory` or `repo-memory` to track processed items so each run only handles new ones.
Reactive triggers (`issues:`, `pull_request:`, comment commands) suit immediate feedback. Otherwise prefer `schedule: daily on weekdays` and batch work. Typical batch-friendly tasks: triage summaries, stale backlog review, token audits, security digests. Combine with `cache-memory` or `repo-memory` to track processed items.

---

Expand All @@ -385,7 +379,7 @@ observability:
headers: ${{ secrets.GH_AW_OTEL_HEADERS }}
```

Setup, agent, and conclusion spans carry token usage attributes — compare runs over time and validate optimizations post-rollout. See [Frontmatter syntax](syntax.md#observability).
Setup, agent, and conclusion spans carry token usage attributes. See [Frontmatter syntax](syntax.md#observability).

### Add AgenticOps token workflows

Expand Down
18 changes: 7 additions & 11 deletions .github/aw/triggers.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Then gate analysis to failure outcomes:
if: contains(fromJson('["failure","timed_out","cancelled","action_required"]'), github.event.workflow_run.conclusion)
```

These conclusion states are grouped as "non-success outcomes requiring triage"; keep the list explicit so readers can adjust it for stricter (for example only `failure`) or broader incident policies.
These are "non-success outcomes requiring triage"; keep the list explicit so readers can tighten (e.g., only `failure`) or broaden it.

No-op expectations for this pattern:

Expand All @@ -47,7 +47,7 @@ No-op expectations for this pattern:

#### Fuzzy Scheduling

Instead of specifying exact cron expressions, use **fuzzy scheduling** to automatically distribute workflow execution times. This reduces load spikes and avoids the "Monday wall of work" problem where weekend tasks pile up.
Use fuzzy scheduling instead of exact cron to distribute execution times. Avoids load spikes and the "Monday wall of work" from weekend accumulation.

**Basic Fuzzy Schedules:**

Expand All @@ -69,15 +69,11 @@ on:

**Why Prefer Weekday Schedules:**

- **Avoids Monday backlog**: Daily workflows that run on weekends accumulate work that hits on Monday morning
- **Better resource usage**: Team-facing workflows align with business hours
- **Reduced noise**: Notifications and issues are created when team members are active
- Avoids Monday backlog from weekend accumulation
- Aligns with team business hours
- Notifications fire when team members are active

The compiler automatically:

- Converts fuzzy schedules to deterministic cron expressions
- Scatters execution times to avoid load spikes (e.g., `daily on weekdays` → `43 5 * * 1-5`)
- Adds `workflow_dispatch:` trigger for manual runs
The compiler converts fuzzy schedules to deterministic cron (e.g., `daily on weekdays` → `43 5 * * 1-5`), scatters execution to avoid load spikes, and adds `workflow_dispatch:` for manual runs.

**Recommended Pattern:**

Expand Down Expand Up @@ -142,7 +138,7 @@ on:
- `pull_request_review_comment` - Pull request review comments
- `*` - All comment-related events (default)

**Note**: Both `issue_comment` and `pull_request_comment` map to GitHub Actions' `issue_comment` event with automatic filtering to distinguish between issue and PR comments.
**Note**: `issue_comment` and `pull_request_comment` both map to GitHub Actions' `issue_comment` event with filtering to distinguish them.

### Label Command Triggers

Expand Down
Loading