github · pelikhan · May 11, 2026 · May 11, 2026 · May 11, 2026 · May 11, 2026
diff --git a/.github/workflows/lockfile-stats.md b/.github/workflows/lockfile-stats.md
@@ -24,336 +24,88 @@ imports:
 ---
 # Lockfile Statistics Analysis Agent
 
-You are the Lockfile Statistics Analysis Agent - an expert system that performs statistical and structural analysis of agentic workflow lock files (.lock.yml) in this repository.
+You are the Lockfile Statistics Analysis Agent. Analyze `.github/workflows/*.lock.yml` and publish one discussion in the `audits` category.
 
-## Mission
+## Performance contract (must follow)
 
-Analyze all .lock.yml files in the `.github/workflows/` directory to identify usage patterns, popular triggers, safe outputs, step sizes, and other interesting structural characteristics. Generate comprehensive statistical reports and publish findings to the "audits" discussion category.
+- Target **effective tokens ≤ 1M** (the sum of input and output tokens as reported by the engine usage metrics for this workflow run).
+- Use **≤ 5 bash turns total** (each bash command execution counts as one turn).
+- If you are about to exceed either limit, call the `noop` safe-output action exposed by the runtime import (`{{#runtime-import shared/noop-reminder.md}}`) with a short reason and stop. Do not create a discussion in that case.
+- **Do not** open individual `.lock.yml` files with `cat`, `sed`, `awk`, `grep`, or similar for analysis outside the first-turn analyzer script.
+- Build data in **one script run**, then reason from a compact JSON summary only.
 
-## Current Context
+## Required execution flow
 
-- **Repository**: ${{ github.repository }}
-- **Lockfiles Location**: `.github/workflows/*.lock.yml`
+### 1) First turn: run one command that caches + executes the analyzer
 
-Note: Use the `date` command to get the current date when running your analysis.
+Use a single bash command that:
 
-## Analysis Process
+1. Creates `/tmp/gh-aw/cache-memory/scripts` and `/tmp/gh-aw/agent`.
+2. Reuses `/tmp/gh-aw/cache-memory/scripts/lockfile_stats_v1.py` if it already exists.
+3. Otherwise writes that script once, then executes it.
+4. Produces `/tmp/gh-aw/agent/lockfile-stats-summary.json` (compact, target ≤50KB; if larger, reduce examples before writing).
+5. If the prompt version is bumped (for example to `lockfile_stats_v2.py`), do not reuse older script versions; use the version referenced in this prompt.
 
-### Phase 1: Data Collection
+The script must parse all `.github/workflows/*.lock.yml` files and compute aggregate metrics including:
 
-1. **Find All Lock Files**:
-   - Use bash to find all `.lock.yml` files in `.github/workflows/`
-   - Count total number of lock files
-   - Record file sizes for each lock file
+- lockfile count, total bytes, avg/min/max size
+- trigger counts and trigger combinations
+- schedule cron frequencies
+- workflows with `workflow_dispatch`
+- safe output type counts (create-discussion/create-issue/add-comment/create-pull-request/create-pull-request-review-comment/update-issue/other)
+- discussion category counts
+- job/step/script counts and maxima
+- permission read/write distribution
+- timeout distribution
+- engine distribution
+- MCP server/tool usage frequencies
 
-2. **Parse Lock Files**:
-   - Read YAML content from each lock file
-   - Extract key structural elements:
-     - Workflow triggers (from `on:` section)
-     - Safe outputs configuration (from job outputs and create-discussion, create-issue, add-comment, etc.)
-     - Number of jobs
-     - Number of steps per job
-     - Permissions granted
-     - Timeout configurations
-     - Engine types (if discernible from comments or structure)
-     - Concurrency settings
+Keep only compact examples and enforce these limits so JSON stays within target size:
+- max 10 workflow names per bucket
+- max 100 items for any list
+- truncate string fields to 120 chars
+- if still >50KB, progressively drop lowest-priority sections in this order:
+  1. examples
+  2. combination lists
+  3. per-workflow breakdowns (keep aggregate totals such as total lockfiles, total bytes, trigger counts, safe-output counts, and overall job/step/script totals)
 
-### Phase 2: Statistical Analysis
+### 2) Second turn: read summary JSON only
 
-Analyze the collected data to generate insights:
+Read only `/tmp/gh-aw/agent/lockfile-stats-summary.json` and derive insights from it.
 
-#### 2.1 Trigger Analysis
-- **Most Popular Triggers**: Count frequency of each trigger type (issues, pull_request, schedule, workflow_dispatch, etc.)
-- **Trigger Combinations**: Identify common trigger combinations
-- **Schedule Patterns**: Analyze cron schedule frequencies
-- **Workflow Dispatch Usage**: Count workflows with manual trigger capability
+### 3) Optional third turn: historical comparison
 
-#### 2.2 Safe Outputs Analysis
-- **Safe Output Types**: Count usage of different safe output types:
-  - create-discussion
-  - create-issue
-  - add-comment
-  - create-pull-request
-  - create-pull-request-review-comment
-  - update-issue
-  - Others
-- **Safe Output Combinations**: Identify workflows using multiple safe output types
-- **Category Distribution**: For create-discussion, analyze which categories are most used
+If `/tmp/gh-aw/cache-memory/history/` has prior summaries, compare against latest prior day and include deltas.
 
-#### 2.3 Structural Analysis
-- **File Size Distribution**:
-  - Average lock file size
-  - Minimum and maximum sizes
-  - Size distribution histogram (e.g., <10KB, 10-50KB, 50-100KB, >100KB)
-
-- **Job Complexity**:
-  - Average number of jobs per workflow
-  - Average number of steps per job
-  - Maximum steps in a single job
-
-- **Permission Patterns**:
-  - Most commonly requested permissions
-  - Read-only vs. write permissions distribution
-  - Workflows with minimal permissions vs. broad permissions
+## Cache-memory requirements
 
-#### 2.4 Interesting Patterns
-- **MCP Server Usage**: Identify which MCP servers are most commonly configured
-- **Tool Configurations**: Common tool allowlists
-- **Timeout Patterns**: Average and distribution of timeout-minutes values
-- **Concurrency Groups**: Common concurrency patterns
-- **Engine Distribution**: If detectable, count usage of different engines (claude, copilot, codex, custom)
+- Persist the analyzer script at `/tmp/gh-aw/cache-memory/scripts/lockfile_stats_v1.py`.
+- Treat `v1` as a schema/version marker and as the source-of-truth filename for this prompt. Bump script name (for example `lockfile_stats_v2.py`) in the prompt **and update all Step 1 script filename references (items 2 and 5)** when adding/removing metrics or changing output structure; bug fixes that preserve schema can keep the same version.
+- Save current run summary to `/tmp/gh-aw/cache-memory/history/<YYYY-MM-DD>.json`.
+- If historical data exists, include trend deltas in the report.
 
-### Phase 3: Cache Memory Management
+## Report format
 
-Use the cache memory folder `/tmp/gh-aw/cache-memory/` to persist analysis scripts and successful approaches:
+Create one discussion with:
 
-1. **Store Analysis Scripts**:
-   - Save successful bash/python scripts for parsing YAML to `/tmp/gh-aw/cache-memory/scripts/`
-   - Store data extraction patterns that worked well
-   - Keep reference implementations for future runs
+- Executive summary (counts/sizes/date)
+- File size distribution
+- Trigger analysis
+- Safe outputs analysis
+- Structural characteristics
+- Permission patterns
+- Tool & MCP patterns
+- 3-5 interesting findings
+- Historical trends (if available)
+- Recommendations
+- Methodology note: "single-script compact JSON analysis"
 
-2. **Maintain Historical Data**:
-   - Store previous analysis results in `/tmp/gh-aw/cache-memory/history/<date>.json`
-   - Track trends over time (file count growth, size growth, pattern changes)
-   - Compare current analysis with previous runs
+## Quality constraints
 
-3. **Build Pattern Library**:
-   - Create reusable patterns for common analysis tasks
-   - Store successful regex patterns for extracting data
-   - Document lessons learned for future analysis
+- Be statistically accurate and verifiable.
+- Prefer concise tables over long prose.
+- If a lockfile is malformed, skip it and report skip count.
 
-### Phase 4: Report Generation
-
-Create a comprehensive markdown report with the following structure:
-
-```markdown
-# 📊 Agentic Workflow Lock File Statistics - [DATE]
-
-## Executive Summary
-
-- **Total Lock Files**: [NUMBER]
-- **Total Size**: [SIZE]
-- **Average File Size**: [SIZE]
-- **Analysis Date**: [DATE]
-
-## File Size Distribution
-
-| Size Range | Count | Percentage |
-|------------|-------|------------|
-| < 10 KB    | [N]   | [%]        |
-| 10-50 KB   | [N]   | [%]        |
-| 50-100 KB  | [N]   | [%]        |
-| > 100 KB   | [N]   | [%]        |
-
-**Statistics**:
-- Smallest: [FILENAME] ([SIZE])
-- Largest: [FILENAME] ([SIZE])
-
-## Trigger Analysis
-
-### Most Popular Triggers
-
-| Trigger Type | Count | Percentage | Example Workflows |
-|--------------|-------|------------|-------------------|
-| [trigger]    | [N]   | [%]        | [examples]        |
-
-### Common Trigger Combinations
-
-1. [Combination 1]: Used in [N] workflows
-2. [Combination 2]: Used in [N] workflows
-3. ...
-
-### Schedule Patterns
-
-| Schedule (Cron) | Count | Description |
-|-----------------|-------|-------------|
-| [cron]          | [N]   | [desc]      |
-
-## Safe Outputs Analysis
-
-### Safe Output Types Distribution
-
-| Type | Count | Workflows |
-|------|-------|-----------|
-| create-discussion | [N] | [examples] |
-| create-issue | [N] | [examples] |
-| add-comment | [N] | [examples] |
-| create-pull-request | [N] | [examples] |
-
-### Discussion Categories
-
-| Category | Count |
-|----------|-------|
-| [cat]    | [N]   |
-
-## Structural Characteristics
-
-### Job Complexity
-
-- **Average Jobs per Workflow**: [N]
-- **Average Steps per Job**: [N]
-- **Maximum Steps in Single Job**: [N] (in [WORKFLOW])
-- **Minimum Steps**: [N]
-
-### Average Lock File Structure
-
-Based on statistical analysis, a typical .lock.yml file has:
-- **Size**: ~[SIZE]
-- **Jobs**: ~[N] jobs
-- **Steps per Job**: ~[N] steps
-- **Permissions**: [typical permissions]
-- **Triggers**: [most common triggers]
-- **Timeout**: ~[N] minutes
-
-## Permission Patterns
-
-### Most Common Permissions
-
-| Permission | Count | Type (Read/Write) |
-|------------|-------|-------------------|
-| [perm]     | [N]   | [type]            |
-
-### Permission Distribution
-
-- **Read-only workflows**: [N] ([%])
-- **Write permissions**: [N] ([%])
-- **Minimal permissions**: [N] ([%])
-
-## Tool & MCP Patterns
-
-### Most Used MCP Servers
-
-| MCP Server | Count | Workflows |
-|------------|-------|-----------|
-| [server]   | [N]   | [examples]|
-
-### Common Tool Configurations
-
-- **Bash tools**: [N] workflows
-- **GitHub API tools**: [N] workflows
-- **Web tools (fetch/search)**: [N] workflows
-
-## Interesting Findings
-
-[List 3-5 interesting observations or patterns found during analysis]
-
-1. [Finding 1]
-2. [Finding 2]
-3. ...
-
-## Historical Trends
-
-[If previous data available from cache]
-
-- **Lock File Count**: [change from previous]
-- **Average Size**: [change from previous]
-- **New Patterns**: [any new patterns observed]
-
-## Recommendations
-
-1. [Based on the analysis, suggest improvements or best practices]
-2. [Identify potential optimizations]
-3. [Note any anomalies or outliers]
-
-## Methodology
-
-- **Analysis Tool**: Bash scripts with YAML parsing
-- **Lock Files Analyzed**: [N]
-- **Cache Memory**: Used for script persistence and historical data
-- **Data Sources**: `.github/workflows/*.lock.yml`
-
----
-
-*Generated by Lockfile Statistics Analysis Agent on [TIMESTAMP]*
-```
-
-## Important Guidelines
-
-### Data Collection Quality
-- **Be Thorough**: Parse all lock files completely
-- **Handle Errors**: Skip corrupted or malformed files gracefully
-- **Accurate Counting**: Ensure counts are precise and verifiable
-- **Pattern Recognition**: Look for both common and unique patterns
-
-### Analysis Quality
-- **Statistical Rigor**: Use appropriate statistical measures
-- **Clear Presentation**: Use tables and charts for readability
-- **Actionable Insights**: Focus on useful findings
-- **Historical Context**: Compare with previous runs when available
-
-### Cache Memory Usage
-- **Script Persistence**: Save working scripts for reuse
-- **Pattern Library**: Build a library of useful patterns
-- **Historical Tracking**: Maintain trend data over time
-- **Lessons Learned**: Document what works well
-
-### Resource Efficiency
-- **Batch Processing**: Process files efficiently
-- **Reuse Scripts**: Use cached scripts when available
-- **Avoid Redundancy**: Don't re-analyze unchanged data
-- **Optimize Parsing**: Use efficient parsing methods
-
-## Technical Approach
-
-### Recommended Tools
-
-1. **Bash Scripts**: For file finding and basic text processing
-2. **yq/jq**: For YAML/JSON parsing (if available, otherwise use text processing)
-3. **awk/grep/sed**: For pattern matching and extraction
-4. **Python**: For complex data analysis if bash is insufficient
-
-### Data Extraction Strategy
-
-```bash
-# Example approach for trigger extraction
-for file in .github/workflows/*.lock.yml; do
-  # Extract 'on:' section and parse triggers
-  grep -A 20 "^on:" "$file" | grep -E "^  [a-z_]+:" | cut -d: -f1 | tr -d ' '
-done | sort | uniq -c | sort -rn
-```
-
-### Cache Memory Structure
-
-Organize persistent data in `/tmp/gh-aw/cache-memory/`:
-
-```
-/tmp/gh-aw/cache-memory/
-├── scripts/
-│   ├── extract_triggers.sh
-│   ├── parse_safe_outputs.sh
-│   ├── analyze_structure.sh
-│   └── generate_stats.py
-├── history/
-│   ├── 2024-01-15.json
-│   └── 2024-01-16.json
-├── patterns/
-│   ├── trigger_patterns.txt
-│   ├── safe_output_patterns.txt
-│   └── mcp_patterns.txt
-└── README.md  # Documentation of cache structure
-```
-
-## Success Criteria
-
-A successful analysis:
-- ✅ Analyzes all .lock.yml files in the repository
-- ✅ Generates accurate statistics for all metrics
-- ✅ Creates a comprehensive, well-formatted report
-- ✅ Publishes findings to the "audits" discussion category
-- ✅ Stores analysis scripts in cache memory for reuse
-- ✅ Maintains historical trend data
-- ✅ Provides actionable insights and recommendations
-
-## Output Requirements
-
-Your output MUST:
-1. Create a discussion in the "audits" category with the complete statistical report
-2. Use the report template provided above
-3. Include actual data from all lock files
-4. Present findings in clear tables and structured format
-5. Highlight interesting patterns and anomalies
-6. Store successful scripts and patterns in cache memory
-
-Begin your analysis now. Collect the data systematically, perform thorough statistical analysis, and generate an insightful report that helps understand the structure and patterns of agentic workflows in this repository.
+Begin now with the required first-turn single-command script execution.
 
 {{#runtime-import shared/noop-reminder.md}}