Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
372 changes: 62 additions & 310 deletions .github/workflows/lockfile-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,336 +24,88 @@ imports:
---
# Lockfile Statistics Analysis Agent

You are the Lockfile Statistics Analysis Agent - an expert system that performs statistical and structural analysis of agentic workflow lock files (.lock.yml) in this repository.
You are the Lockfile Statistics Analysis Agent. Analyze `.github/workflows/*.lock.yml` and publish one discussion in the `audits` category.

## Mission
## Performance contract (must follow)

Analyze all .lock.yml files in the `.github/workflows/` directory to identify usage patterns, popular triggers, safe outputs, step sizes, and other interesting structural characteristics. Generate comprehensive statistical reports and publish findings to the "audits" discussion category.
- Target **effective tokens ≤ 1M** (the sum of input and output tokens as reported by the engine usage metrics for this workflow run).
- Use **≤ 5 bash turns total** (each bash command execution counts as one turn).
- If you are about to exceed either limit, call the `noop` safe-output action exposed by the runtime import (`{{#runtime-import shared/noop-reminder.md}}`) with a short reason and stop. Do not create a discussion in that case.
- **Do not** open individual `.lock.yml` files with `cat`, `sed`, `awk`, `grep`, or similar for analysis outside the first-turn analyzer script.
- Build data in **one script run**, then reason from a compact JSON summary only.

## Current Context
## Required execution flow

- **Repository**: ${{ github.repository }}
- **Lockfiles Location**: `.github/workflows/*.lock.yml`
### 1) First turn: run one command that caches + executes the analyzer

Note: Use the `date` command to get the current date when running your analysis.
Use a single bash command that:

## Analysis Process
1. Creates `/tmp/gh-aw/cache-memory/scripts` and `/tmp/gh-aw/agent`.
2. Reuses `/tmp/gh-aw/cache-memory/scripts/lockfile_stats_v1.py` if it already exists.
3. Otherwise writes that script once, then executes it.
4. Produces `/tmp/gh-aw/agent/lockfile-stats-summary.json` (compact, target ≤50KB; if larger, reduce examples before writing).
5. If the prompt version is bumped (for example to `lockfile_stats_v2.py`), do not reuse older script versions; use the version referenced in this prompt.

### Phase 1: Data Collection
The script must parse all `.github/workflows/*.lock.yml` files and compute aggregate metrics including:

1. **Find All Lock Files**:
- Use bash to find all `.lock.yml` files in `.github/workflows/`
- Count total number of lock files
- Record file sizes for each lock file
- lockfile count, total bytes, avg/min/max size
- trigger counts and trigger combinations
- schedule cron frequencies
- workflows with `workflow_dispatch`
- safe output type counts (create-discussion/create-issue/add-comment/create-pull-request/create-pull-request-review-comment/update-issue/other)
- discussion category counts
- job/step/script counts and maxima
- permission read/write distribution
- timeout distribution
- engine distribution
- MCP server/tool usage frequencies

2. **Parse Lock Files**:
- Read YAML content from each lock file
- Extract key structural elements:
- Workflow triggers (from `on:` section)
- Safe outputs configuration (from job outputs and create-discussion, create-issue, add-comment, etc.)
- Number of jobs
- Number of steps per job
- Permissions granted
- Timeout configurations
- Engine types (if discernible from comments or structure)
- Concurrency settings
Keep only compact examples and enforce these limits so JSON stays within target size:
- max 10 workflow names per bucket
- max 100 items for any list
- truncate string fields to 120 chars
- if still >50KB, progressively drop lowest-priority sections in this order:
1. examples
2. combination lists
3. per-workflow breakdowns (keep aggregate totals such as total lockfiles, total bytes, trigger counts, safe-output counts, and overall job/step/script totals)

### Phase 2: Statistical Analysis
### 2) Second turn: read summary JSON only

Analyze the collected data to generate insights:
Read only `/tmp/gh-aw/agent/lockfile-stats-summary.json` and derive insights from it.

Comment on lines +72 to 75
#### 2.1 Trigger Analysis
- **Most Popular Triggers**: Count frequency of each trigger type (issues, pull_request, schedule, workflow_dispatch, etc.)
- **Trigger Combinations**: Identify common trigger combinations
- **Schedule Patterns**: Analyze cron schedule frequencies
- **Workflow Dispatch Usage**: Count workflows with manual trigger capability
### 3) Optional third turn: historical comparison

#### 2.2 Safe Outputs Analysis
- **Safe Output Types**: Count usage of different safe output types:
- create-discussion
- create-issue
- add-comment
- create-pull-request
- create-pull-request-review-comment
- update-issue
- Others
- **Safe Output Combinations**: Identify workflows using multiple safe output types
- **Category Distribution**: For create-discussion, analyze which categories are most used
If `/tmp/gh-aw/cache-memory/history/` has prior summaries, compare against latest prior day and include deltas.

#### 2.3 Structural Analysis
- **File Size Distribution**:
- Average lock file size
- Minimum and maximum sizes
- Size distribution histogram (e.g., <10KB, 10-50KB, 50-100KB, >100KB)

- **Job Complexity**:
- Average number of jobs per workflow
- Average number of steps per job
- Maximum steps in a single job

- **Permission Patterns**:
- Most commonly requested permissions
- Read-only vs. write permissions distribution
- Workflows with minimal permissions vs. broad permissions
## Cache-memory requirements

#### 2.4 Interesting Patterns
- **MCP Server Usage**: Identify which MCP servers are most commonly configured
- **Tool Configurations**: Common tool allowlists
- **Timeout Patterns**: Average and distribution of timeout-minutes values
- **Concurrency Groups**: Common concurrency patterns
- **Engine Distribution**: If detectable, count usage of different engines (claude, copilot, codex, custom)
- Persist the analyzer script at `/tmp/gh-aw/cache-memory/scripts/lockfile_stats_v1.py`.
- Treat `v1` as a schema/version marker and as the source-of-truth filename for this prompt. Bump script name (for example `lockfile_stats_v2.py`) in the prompt **and update all Step 1 script filename references (items 2 and 5)** when adding/removing metrics or changing output structure; bug fixes that preserve schema can keep the same version.
- Save current run summary to `/tmp/gh-aw/cache-memory/history/<YYYY-MM-DD>.json`.
- If historical data exists, include trend deltas in the report.

### Phase 3: Cache Memory Management
## Report format

Use the cache memory folder `/tmp/gh-aw/cache-memory/` to persist analysis scripts and successful approaches:
Create one discussion with:

1. **Store Analysis Scripts**:
- Save successful bash/python scripts for parsing YAML to `/tmp/gh-aw/cache-memory/scripts/`
- Store data extraction patterns that worked well
- Keep reference implementations for future runs
- Executive summary (counts/sizes/date)
- File size distribution
- Trigger analysis
- Safe outputs analysis
- Structural characteristics
- Permission patterns
- Tool & MCP patterns
- 3-5 interesting findings
- Historical trends (if available)
- Recommendations
- Methodology note: "single-script compact JSON analysis"

2. **Maintain Historical Data**:
- Store previous analysis results in `/tmp/gh-aw/cache-memory/history/<date>.json`
- Track trends over time (file count growth, size growth, pattern changes)
- Compare current analysis with previous runs
## Quality constraints

3. **Build Pattern Library**:
- Create reusable patterns for common analysis tasks
- Store successful regex patterns for extracting data
- Document lessons learned for future analysis
- Be statistically accurate and verifiable.
- Prefer concise tables over long prose.
- If a lockfile is malformed, skip it and report skip count.

### Phase 4: Report Generation

Create a comprehensive markdown report with the following structure:

```markdown
# 📊 Agentic Workflow Lock File Statistics - [DATE]

## Executive Summary

- **Total Lock Files**: [NUMBER]
- **Total Size**: [SIZE]
- **Average File Size**: [SIZE]
- **Analysis Date**: [DATE]

## File Size Distribution

| Size Range | Count | Percentage |
|------------|-------|------------|
| < 10 KB | [N] | [%] |
| 10-50 KB | [N] | [%] |
| 50-100 KB | [N] | [%] |
| > 100 KB | [N] | [%] |

**Statistics**:
- Smallest: [FILENAME] ([SIZE])
- Largest: [FILENAME] ([SIZE])

## Trigger Analysis

### Most Popular Triggers

| Trigger Type | Count | Percentage | Example Workflows |
|--------------|-------|------------|-------------------|
| [trigger] | [N] | [%] | [examples] |

### Common Trigger Combinations

1. [Combination 1]: Used in [N] workflows
2. [Combination 2]: Used in [N] workflows
3. ...

### Schedule Patterns

| Schedule (Cron) | Count | Description |
|-----------------|-------|-------------|
| [cron] | [N] | [desc] |

## Safe Outputs Analysis

### Safe Output Types Distribution

| Type | Count | Workflows |
|------|-------|-----------|
| create-discussion | [N] | [examples] |
| create-issue | [N] | [examples] |
| add-comment | [N] | [examples] |
| create-pull-request | [N] | [examples] |

### Discussion Categories

| Category | Count |
|----------|-------|
| [cat] | [N] |

## Structural Characteristics

### Job Complexity

- **Average Jobs per Workflow**: [N]
- **Average Steps per Job**: [N]
- **Maximum Steps in Single Job**: [N] (in [WORKFLOW])
- **Minimum Steps**: [N]

### Average Lock File Structure

Based on statistical analysis, a typical .lock.yml file has:
- **Size**: ~[SIZE]
- **Jobs**: ~[N] jobs
- **Steps per Job**: ~[N] steps
- **Permissions**: [typical permissions]
- **Triggers**: [most common triggers]
- **Timeout**: ~[N] minutes

## Permission Patterns

### Most Common Permissions

| Permission | Count | Type (Read/Write) |
|------------|-------|-------------------|
| [perm] | [N] | [type] |

### Permission Distribution

- **Read-only workflows**: [N] ([%])
- **Write permissions**: [N] ([%])
- **Minimal permissions**: [N] ([%])

## Tool & MCP Patterns

### Most Used MCP Servers

| MCP Server | Count | Workflows |
|------------|-------|-----------|
| [server] | [N] | [examples]|

### Common Tool Configurations

- **Bash tools**: [N] workflows
- **GitHub API tools**: [N] workflows
- **Web tools (fetch/search)**: [N] workflows

## Interesting Findings

[List 3-5 interesting observations or patterns found during analysis]

1. [Finding 1]
2. [Finding 2]
3. ...

## Historical Trends

[If previous data available from cache]

- **Lock File Count**: [change from previous]
- **Average Size**: [change from previous]
- **New Patterns**: [any new patterns observed]

## Recommendations

1. [Based on the analysis, suggest improvements or best practices]
2. [Identify potential optimizations]
3. [Note any anomalies or outliers]

## Methodology

- **Analysis Tool**: Bash scripts with YAML parsing
- **Lock Files Analyzed**: [N]
- **Cache Memory**: Used for script persistence and historical data
- **Data Sources**: `.github/workflows/*.lock.yml`

---

*Generated by Lockfile Statistics Analysis Agent on [TIMESTAMP]*
```

## Important Guidelines

### Data Collection Quality
- **Be Thorough**: Parse all lock files completely
- **Handle Errors**: Skip corrupted or malformed files gracefully
- **Accurate Counting**: Ensure counts are precise and verifiable
- **Pattern Recognition**: Look for both common and unique patterns

### Analysis Quality
- **Statistical Rigor**: Use appropriate statistical measures
- **Clear Presentation**: Use tables and charts for readability
- **Actionable Insights**: Focus on useful findings
- **Historical Context**: Compare with previous runs when available

### Cache Memory Usage
- **Script Persistence**: Save working scripts for reuse
- **Pattern Library**: Build a library of useful patterns
- **Historical Tracking**: Maintain trend data over time
- **Lessons Learned**: Document what works well

### Resource Efficiency
- **Batch Processing**: Process files efficiently
- **Reuse Scripts**: Use cached scripts when available
- **Avoid Redundancy**: Don't re-analyze unchanged data
- **Optimize Parsing**: Use efficient parsing methods

## Technical Approach

### Recommended Tools

1. **Bash Scripts**: For file finding and basic text processing
2. **yq/jq**: For YAML/JSON parsing (if available, otherwise use text processing)
3. **awk/grep/sed**: For pattern matching and extraction
4. **Python**: For complex data analysis if bash is insufficient

### Data Extraction Strategy

```bash
# Example approach for trigger extraction
for file in .github/workflows/*.lock.yml; do
# Extract 'on:' section and parse triggers
grep -A 20 "^on:" "$file" | grep -E "^ [a-z_]+:" | cut -d: -f1 | tr -d ' '
done | sort | uniq -c | sort -rn
```

### Cache Memory Structure

Organize persistent data in `/tmp/gh-aw/cache-memory/`:

```
/tmp/gh-aw/cache-memory/
├── scripts/
│ ├── extract_triggers.sh
│ ├── parse_safe_outputs.sh
│ ├── analyze_structure.sh
│ └── generate_stats.py
├── history/
│ ├── 2024-01-15.json
│ └── 2024-01-16.json
├── patterns/
│ ├── trigger_patterns.txt
│ ├── safe_output_patterns.txt
│ └── mcp_patterns.txt
└── README.md # Documentation of cache structure
```

## Success Criteria

A successful analysis:
- ✅ Analyzes all .lock.yml files in the repository
- ✅ Generates accurate statistics for all metrics
- ✅ Creates a comprehensive, well-formatted report
- ✅ Publishes findings to the "audits" discussion category
- ✅ Stores analysis scripts in cache memory for reuse
- ✅ Maintains historical trend data
- ✅ Provides actionable insights and recommendations

## Output Requirements

Your output MUST:
1. Create a discussion in the "audits" category with the complete statistical report
2. Use the report template provided above
3. Include actual data from all lock files
4. Present findings in clear tables and structured format
5. Highlight interesting patterns and anomalies
6. Store successful scripts and patterns in cache memory

Begin your analysis now. Collect the data systematically, perform thorough statistical analysis, and generate an insightful report that helps understand the structure and patterns of agentic workflows in this repository.
Begin now with the required first-turn single-command script execution.

{{#runtime-import shared/noop-reminder.md}}