diff --git a/.claude/commands/analyze-ci/create-bugs.md b/.claude/commands/analyze-ci/create-bugs.md deleted file mode 100644 index 835e6ac6d3..0000000000 --- a/.claude/commands/analyze-ci/create-bugs.md +++ /dev/null @@ -1,501 +0,0 @@ ---- -name: Create JIRA Bugs from CI Analysis -argument-hint: [--create] -description: Create JIRA bugs from analyze-ci failure reports (dry-run by default). Supports both release and PR job files. -allowed-tools: Bash, Read, Write, Glob, Grep, Agent, mcp__jira__jira_search, mcp__jira__jira_create_issue, mcp__jira__jira_get_issue, mcp__jira__jira_get_transitions, mcp__jira__jira_transition_issue, mcp__jira__jira_add_comment ---- - -# analyze-ci:create-bugs - -## Synopsis -```bash -/analyze-ci:create-bugs [--create] -``` - -## Description -Reads individual job analysis reports produced by `analyze-ci:doctor` and creates JIRA bugs in USHIFT for CI test failures. Operates in **dry-run mode by default** - it shows what bugs would be created without actually creating them. Use `--create` to perform actual issue creation. - -This command does NOT re-analyze CI jobs. It consumes existing job analysis files from `${WORKDIR}/`. - -## Arguments -- `$ARGUMENTS` (required): Source identifier, optionally followed by `--create` - - `` (required): One of the following: - - **Release version** (e.g., `4.22`, `main`): Looks for files matching `analyze-ci-release--job-*.txt` - - **PR number** (e.g., `pr-6396` or `pr6396`): Looks for files matching `analyze-ci-prs-job-*-pr-*.txt` - - **Rebase PR shorthand** (e.g., `rebase-release-4.22`): Resolves to the corresponding rebase PR by scanning existing `analyze-ci-prs-job-*` files for the matching release version in their content - - `--create` (optional): Actually create JIRA issues. Without this flag, only a dry-run report is produced. - -## Prerequisites - -- Job analysis files must already exist in `${WORKDIR}/`: - - For releases: `analyze-ci-release--job-*.txt` (produced by `/analyze-ci:doctor`) - - For PRs: `analyze-ci-prs-job-*-pr-*.txt` (produced by `/analyze-ci:doctor`) -- Each job file must contain a `--- STRUCTURED SUMMARY ---` block (see below) -- MCP Jira server must be configured and accessible -- User must have permissions to create issues in USHIFT - -### STRUCTURED SUMMARY Block - -Each job analysis file produced by `/analyze-ci:prow-job` must end with a machine-readable block: - -```text ---- STRUCTURED SUMMARY --- -SEVERITY: <1-5> -STACK_LAYER: -STEP_NAME: -ERROR_SIGNATURE: -RAW_ERROR: -INFRASTRUCTURE_FAILURE: -JOB_URL: -JOB_NAME: -RELEASE: -FINISHED: ---- END STRUCTURED SUMMARY --- -``` - -If a job file lacks this block, it is skipped with a warning. - -## Work Directory - -Set once at the start and reference throughout: -```bash -WORKDIR=/tmp/analyze-ci-claude-workdir.$(date +%y%m%d) -``` - -## Implementation Steps - -### Step 1: Prepare Bug Candidates (Deterministic Script) - -**Actions**: -1. Parse `$ARGUMENTS` to extract `` and detect `--create` flag -2. Determine mode: if `--create` is present, set `MODE=create`; otherwise `MODE=dry-run` -3. Run `WORKDIR=/tmp/analyze-ci-claude-workdir.$(date +%y%m%d) && mkdir -p ${WORKDIR}` using the `Bash` tool -4. Run the preparation script to parse job files, group by signature, and extract search keywords: - ```bash - python3 .claude/scripts/analyze-ci-search-bugs.py --workdir ${WORKDIR} - ``` -5. The script writes `${WORKDIR}/analyze-ci-bug-candidates-.json` containing: - - Parsed and deduplicated bug candidates (grouped by ERROR_SIGNATURE similarity) - - Pre-computed `keywords` (2-4 distinctive search terms per candidate) - - Pre-computed `test_ids` (numeric IDs like `55394` for test case searches) - - Full `analysis_text` for bug descriptions - - Job lists with URLs and dates -6. Read the candidates JSON file for use in Step 2 - -**Error Handling**: -- No arguments: show usage and stop -- Script exits with error if no job files found — relay its error message to the user - -### Step 2: Search Jira for Existing Bugs - -For each bug candidate in the candidates JSON, run ALL of the following searches. The `keywords` and `test_ids` fields are pre-computed by the script — use them directly. - -**Search A — Keyword search (multiple focused queries)**: -1. Use the pre-computed `keywords` array from the candidate (already filtered for stop words and ranked by specificity) -2. Run **2-3 separate searches in parallel**, each using 1-2 keywords from the array. Do NOT put all keywords into a single `text ~` query — Jira requires all terms to match, so queries with 3+ keywords are fragile and miss issues that use slightly different wording. - ```python - # Example: candidate.keywords = ["invalidclienttokenid", "cloudformation", "createstack", "aws-2"] - # Search A1: most distinctive keyword - mcp__jira__jira_search(jql='... AND issuetype = Bug AND text ~ "invalidclienttokenid" ...', limit=5) - # Search A2: second keyword - mcp__jira__jira_search(jql='... AND issuetype = Bug AND text ~ "cloudformation" ...', limit=5) - ``` -3. Merge and deduplicate results from all A-series queries before proceeding - -**Search B — Test case ID search (MANDATORY when `test_ids` is non-empty)**: -Use the pre-computed `test_ids` array from the candidate. For EACH ID, run TWO separate searches: -```text -# Search B1: bare number -jql: ... AND issuetype = Bug AND text ~ "68256" AND status not in (Closed, Verified) ... - -# Search B2: OCP-prefixed form (OpenShift Polarion convention) -jql: ... AND issuetype = Bug AND text ~ "OCP-68256" AND status not in (Closed, Verified) ... -``` -**Why both forms are required**: Jira's text indexer treats `OCP-68256` as a single token, so `text ~ "68256"` will NOT match issues containing `OCP-68256`, and vice versa. Skipping either form WILL cause missed duplicates. - -**After all searches**: -1. Merge and deduplicate results from all search queries (A, B1, B2) -2. If potential duplicates are found, fetch their details with `mcp__jira__jira_get_issue` to show summary and status - -**Search C — Regression check (closed/verified issues)**: -After completing searches A and B, run an additional keyword search against closed/verified issues to detect potential regressions: -```python -mcp__jira__jira_search( - jql='((project = OCPBUGS AND component = MicroShift) OR project = USHIFT) AND issuetype = Bug AND text ~ "" AND status in (Closed, Verified) ORDER BY updated DESC', - limit=5 -) -``` -If results are found, fetch their details with `mcp__jira__jira_get_issue` and flag them as **"Potential regression of closed bug"** — distinct from open duplicates. These should be shown to the user but do NOT block creation; they serve as a warning that a previously fixed issue may have resurfaced. - -**Note**: Run searches in parallel where possible. - -### Step 3: Present Bug Candidates to User - -**Actions**: -1. Display a numbered list of all bug candidates with: - - Summary (derived from error signature) - - Severity and affected job count - - Step name(s) where failure occurred - - List of affected job URLs - - Potential duplicate JIRAs found (if any), with key, summary, and status - - Mode indicator: `[DRY-RUN]` or `[WILL CREATE]` - -2. **In dry-run mode** (`--create` NOT specified): - - Display all candidates with `[DRY-RUN]` prefix - - After listing all candidates, show a summary: - ```text - DRY-RUN SUMMARY - Source: - Total job files parsed: N - Unique bug candidates: N - Candidates with potential duplicates: N - Candidates ready to file: N - - To create these bugs, run: - /analyze-ci:create-bugs --create - ``` - - Do NOT prompt for any actions. Do NOT create any issues. Do NOT proceed to Steps 4/4a (create/reopen). Continue to Step 5 and Step 6. - -3. **In create mode** (`--create` specified): - - For each candidate, prompt the user: - ```text - Bug Candidate N/M: - Summary: "" - Severity: X (affects Y jobs) - Step: - Jobs: - - - - - Potential Duplicates (open): - - USHIFT-XXXXX: "" [Status] (or OCPBUGS-YYYYY) - (or "None found") - Potential Regressions (closed): - - USHIFT-YYYYY (or OCPBUGS-YYYYY): "" [Status] potential regression - (or "None found") - - # ACTION_PROMPT_WITH_REOPEN (use when closed regressions exist): - Action? [c]reate / [s]kip / [l]ink-to-existing / [r]eopen : - - # ACTION_PROMPT_NO_REOPEN (use when no closed regressions exist): - Action? [c]reate / [s]kip / [l]ink-to-existing : - ``` - - Select the prompt template based on whether closed regressions were found for the candidate: use `ACTION_PROMPT_WITH_REOPEN` when the candidate has closed regressions from Search C, and `ACTION_PROMPT_NO_REOPEN` otherwise. - - **create**: Proceed to Step 4 - - **skip**: Skip this candidate, move to next - - **link-to-existing**: Validate the key by calling `mcp__jira__jira_get_issue(issue_key=)`. If the issue exists, record the key and move to next. If the call fails or returns not-found, show an error (e.g., `"JIRA key not found — check for typos"`) and re-prompt with the same `Action?` choices. - - **reopen**: Validate the provided JIRA key before proceeding. Call `mcp__jira__jira_get_issue(issue_key=)` to confirm the issue exists, then verify that the key matches one of the candidate's closed regressions found in Search C, that the issue status is Closed or Verified, and that the issue type is Bug. If validation fails (key not found, not in the candidate's closed regression list, not in Closed/Verified state, or not a Bug), show an error (e.g., `"JIRA key not eligible for reopen — must be a Bug closed regression"`) and re-prompt with the same `Action?` choices. If validation passes, proceed to Step 4a. - -### Step 4: Create Bug via MCP (create mode only) - -**Actions**: -For each candidate where user chose "create": - -1. **Construct the bug summary**: - - Format: `"MicroShift CI: "` (truncate to 100 chars if needed) - -2. **Construct the bug description** using **Markdown** format (the MCP Jira tool accepts Markdown and automatically converts it to Jira wiki markup — do NOT write Jira wiki markup directly): - ```text - ## Description of problem - - CI job failures detected for MicroShift . - - - - ## Version-Release number of selected component (if applicable) - - - - ## How reproducible - - Always (fails consistently in CI) - - ## Steps to Reproduce - - 1. Run the CI job(s) listed below - 2. Observe failure in step: - - ## Actual results - - ```` - - ```` - - ## Expected results - - CI job should pass successfully. - - ## Additional info - - **Stack Layer:** - **CI Step:** - **Error Severity:** /5 - **Number of affected jobs:** - **Last observed:** - - **Affected Jobs:** - - - []() - - - **Source:** Auto-generated by /analyze-ci:create-bugs from CI analysis output. - ``` - -3. **Create the issue**: - ```python - mcp__jira__jira_create_issue( - project_key="USHIFT", - summary="MicroShift CI: ", - issue_type="Bug", - description="", - components="MicroShift", - additional_fields={ - "labels": ["microshift-ci-ai-generated"], - "security": {"name": "Red Hat Employee"}, - "customfield_10028": 0 # Story Points - } - ) - ``` - -4. **Record the result**: Store the created issue key for the final report. - -**Error Handling**: -- If MCP call fails, report error, ask user if they want to retry or skip -- Do NOT retry automatically - -### Step 4a: Reopen Closed Bug as Regression (create mode only) - -**Precondition**: The JIRA issue must be a Bug in Closed or Verified state (validated in Step 3). If the issue type is not Bug, do not proceed — show an error and re-prompt. - -**Actions**: -For each candidate where user chose "reopen": - -1. **Get available transitions** for the closed issue: - ```python - mcp__jira__jira_get_transitions(issue_key="") - ``` - -2. **Find the reopen transition**: Look for a transition whose name is exactly "To Do", "New", or "Backlog" (case-insensitive). If no suitable transition is found, report the error and ask the user whether to create a new bug instead or skip. - -3. **Construct a regression comment** describing the new occurrences: - ```text - ## Regression: issue has resurfaced - - This issue was previously closed but the same failure has been detected again in CI. - - **Error Signature:** - **Error Severity:** /5 - **Number of affected jobs:** - **Last observed:** - - **Affected Jobs:** - - []() - ... - - Reopened automatically by /analyze-ci:create-bugs. - ``` - -4. **Transition the issue** to reopen it: - ```python - mcp__jira__jira_transition_issue( - issue_key="", - transition_id="", - comment="" - ) - ``` - -5. If the transition call does not support inline comments, add the comment separately: - ```python - mcp__jira__jira_add_comment( - issue_key="", - body="" - ) - ``` - -6. **Record the result**: Store the reopened issue key for the final report. - -**Error Handling**: -- If no reopen-like transition is available, report available transitions to user and ask whether to create a new bug or skip -- If the transition fails, report error and ask user if they want to retry, create a new bug instead, or skip -- Do NOT retry automatically - -### Step 5: Write Machine-Readable Bug Mapping File - -**Actions**: -After processing all bug candidates (Steps 2-4a) and regardless of mode (dry-run or create), write a machine-readable bug mapping file that `analyze-ci-create-report.py` can consume to display JIRA bug links in the HTML report. The file content is based on the Jira search results from Step 2 — it is not affected by whether bugs were created or reopened in Steps 4/4a. - -1. Save to `${WORKDIR}/analyze-ci-bugs-.json` (overwrite if exists) -2. Use this JSON format: - -```json -{ - "source": "", - "date": "YYYY-MM-DD", - "candidates": [ - { - "error_signature": "", - "severity": , - "step_name": "", - "affected_jobs": , - "duplicates": [ - {"key": "", "summary": "", "status": ""} - ], - "regressions": [ - {"key": "", "summary": "", "status": ""} - ] - } - ] -} -``` - -3. **IMPORTANT**: This file must be written in BOTH dry-run and create modes. The file enables `analyze-ci-create-report.py` to show linked bugs per issue in the HTML report. -4. Use empty arrays `[]` for `duplicates` and `regressions` when none are found. -5. Save using a Bash heredoc with `jq` or `python3 -c` to ensure valid JSON, or use the Write tool. - -### Step 6: Generate Results Report - -**Actions**: -1. Save report to `${WORKDIR}/analyze-ci-create-bugs-..txt` -2. Display summary to user: - -**Dry-run report format**: -```text -═══════════════════════════════════════════════════════════════ -ANALYZE-CI CREATE BUGS - DRY-RUN REPORT -Source: -Date: YYYY-MM-DD -═══════════════════════════════════════════════════════════════ - -PARSING - Job files found: N - Successfully parsed: N - Skipped (no structured summary): N - -FILTERING - None (all failures included) - -DEDUPLICATION - Unique bug candidates: N - -CANDIDATES - - 1. MicroShift CI: - Severity: X | Jobs: Y | Step: - Potential Duplicates: USHIFT-XXXXX, OCPBUGS-YYYYY (or "None") - Potential Regressions: USHIFT-YYYYY (or OCPBUGS-YYYYY) [Closed] (or "None") - - 2. MicroShift CI: - ... - -To create these bugs, run: - /analyze-ci:create-bugs --create - -Report saved: ${WORKDIR}/analyze-ci-create-bugs-..txt -═══════════════════════════════════════════════════════════════ -``` - -**Create mode report format**: -```text -═══════════════════════════════════════════════════════════════ -ANALYZE-CI CREATE BUGS - CREATION REPORT -Source: -Date: YYYY-MM-DD -═══════════════════════════════════════════════════════════════ - -RESULTS - - 1. USHIFT-12345 (CREATED) - MicroShift CI: - URL: https://redhat.atlassian.net/browse/USHIFT-12345 - - 2. SKIPPED - MicroShift CI: - Reason: User skipped - - 3. USHIFT-99999 (LINKED TO EXISTING) - MicroShift CI: - Reason: Duplicate of existing issue - - 4. USHIFT-88888 (REOPENED) - MicroShift CI: - URL: https://redhat.atlassian.net/browse/USHIFT-88888 - Reason: Regression of previously closed bug - -SUMMARY - Created: N - Skipped: N - Linked to existing: N - Reopened: N - Failed: N - -Report saved: ${WORKDIR}/analyze-ci-create-bugs-..txt -═══════════════════════════════════════════════════════════════ -``` - -## Examples - -### Example 1: Dry-Run for a Release (Default) -```bash -/analyze-ci:create-bugs 4.22 -``` -Shows what bugs would be created from release 4.22 analysis without creating anything. - -### Example 2: Create Bugs for a Release -```bash -/analyze-ci:create-bugs 4.22 --create -``` -Interactively creates bugs from release 4.22 analysis. - -### Example 3: Dry-Run for a PR -```bash -/analyze-ci:create-bugs pr-6396 -``` -Shows what bugs would be created from PR #6396 analysis. - -### Example 4: Create Bugs for a Rebase PR -```bash -/analyze-ci:create-bugs rebase-release-4.22 --create -``` -Resolves the rebase PR for release 4.22, then interactively creates bugs. - -### Example 5: No Job Files Found -```bash -/analyze-ci:create-bugs 4.19 -``` -```text -Error: No job analysis files found at ${WORKDIR}/analyze-ci-release-4.19-job-*.txt - -Run the analysis first: - /analyze-ci:doctor 4.19 -``` - -### Example 6: No PR Job Files Found -```bash -/analyze-ci:create-bugs pr-9999 -``` -```text -Error: No job analysis files found at ${WORKDIR}/analyze-ci-prs-job-*-pr9999-*.txt - -Run the analysis first: - /analyze-ci:doctor -``` - -## Notes - -- This command does NOT run CI analysis — it only consumes existing analysis files from `${WORKDIR}` -- Supports two file naming patterns: - - Release jobs: `analyze-ci-release--job-*.txt` (from `/analyze-ci:doctor`) - - PR jobs: `analyze-ci-prs-job-*-pr-*.txt` (from `/analyze-ci:doctor`) -- Dry-run is the default to prevent accidental bug creation -- The `--create` flag triggers interactive mode where each candidate requires user confirmation -- All failures are included without filtering — no entries are skipped based on severity, infrastructure status, or stack layer -- Bugs are created in USHIFT with component "MicroShift"; duplicate search covers both USHIFT and OCPBUGS -- All created bugs are labeled with `microshift-ci-ai-generated` for tracking -- Security level is set to "Red Hat Employee" on all created issues -- The STRUCTURED SUMMARY block in job files is required — this is a contract with `/analyze-ci:prow-job` -- In addition to the text report, a machine-readable bug mapping file (`analyze-ci-bugs-.json`) is written in both dry-run and create modes — this file is consumed by `analyze-ci-create-report.py` to show JIRA bug links in the HTML report - -## Related Skills - -- **analyze-ci:doctor**: Produces job analysis files consumed by this command -- **analyze-ci:prow-job**: Command that produces individual job reports with STRUCTURED SUMMARY -- **jira:create-bug**: Interactive bug creation (not used here — we call MCP directly) diff --git a/.claude/commands/analyze-ci/doctor.md b/.claude/commands/analyze-ci/doctor.md deleted file mode 100644 index 9fe351f5dc..0000000000 --- a/.claude/commands/analyze-ci/doctor.md +++ /dev/null @@ -1,179 +0,0 @@ ---- -name: Analyze CI Doctor -argument-hint: -description: Analyze CI for multiple MicroShift releases and produce an HTML summary -allowed-tools: Skill, Bash, Read, Write, Glob, Grep, Agent ---- - -# analyze-ci:doctor - -## Synopsis -```bash -/analyze-ci:doctor -``` - -## Description -Accepts a comma-separated list of MicroShift release versions, runs analysis for each release and for open rebase PRs, and produces a single HTML summary file consolidating all results. Uses deterministic scripts for data collection, artifact download, aggregation, and HTML generation. LLM agents are used only for per-job root cause analysis and Jira bug correlation. - -## Arguments -- `$ARGUMENTS` (required): Comma-separated list of release versions (e.g., `4.19,4.20,4.21,4.22`) - -## Work Directory - -Set once at the start and reference throughout: -```bash -WORKDIR=/tmp/analyze-ci-claude-workdir.$(date +%y%m%d) -``` - -## Implementation Steps - -### Step 1: Prepare — Collect and Download All Artifacts - -**Goal**: Deterministically collect all failed jobs and download their artifacts before any LLM analysis. - -**Actions**: -1. Run `WORKDIR=/tmp/analyze-ci-claude-workdir.$(date +%y%m%d)` using the `Bash` tool -2. Run the prepare script: - ```bash - bash .claude/scripts/analyze-ci-doctor.sh prepare --workdir ${WORKDIR} $ARGUMENTS --rebase - ``` -3. The script deterministically: - - For each release: fetches failed periodic jobs, downloads artifacts, writes `${WORKDIR}/analyze-ci-release--jobs.json` - - For rebase PRs: fetches PRs with failures, downloads artifacts, writes `${WORKDIR}/analyze-ci-prs-jobs.json` and `${WORKDIR}/analyze-ci-prs-status.json` - - Outputs a JSON summary listing all releases, job counts, and file paths -4. Read the JSON output to know which releases have jobs to analyze and how many - -**Error Handling**: -- If `$ARGUMENTS` is empty, show usage and stop -- If a release has no failed jobs, its jobs JSON will be an empty array — skip analysis for that release - -### Step 2: Analyze Each Job Using /analyze-ci:prow-job - -**Goal**: Get detailed root cause analysis for each failed job using pre-downloaded artifacts. - -**Actions**: -1. For each release that has jobs (from the Step 1 JSON output), read `${WORKDIR}/analyze-ci-release--jobs.json` -2. For rebase PRs (if any), read `${WORKDIR}/analyze-ci-prs-jobs.json` -3. For **every** job across all releases and PRs, launch a separate **Agent** (using the `Agent` tool, NOT the `Skill` tool): - - **For release jobs:** - ```text - Agent: subagent_type=general_purpose, prompt="Analyze this Prow job and save the report: - 1. Run /analyze-ci:prow-job - 2. After the analysis completes, save the FULL report output (including the --- STRUCTURED SUMMARY --- block) to: - ${WORKDIR}/analyze-ci-release--job--.txt - Use the Write tool to save the file. The file must contain the complete analysis report." - ``` - - **For PR jobs:** - ```text - Agent: subagent_type=general_purpose, prompt="Analyze this Prow job and save the report: - 1. Run /analyze-ci:prow-job - 2. After the analysis completes, save the FULL report output (including the --- STRUCTURED SUMMARY --- block) to: - ${WORKDIR}/analyze-ci-prs-job--pr-.txt - Use the Write tool to save the file. The file must contain the complete analysis report." - ``` - -4. Launch **ALL** agents (all releases + PRs) in parallel using `run_in_background: true` -5. Wait until ALL agents are confirmed complete before proceeding to Step 3 - -**Progress Reporting**: -```text -Analyzing N jobs in parallel across M releases... -``` - -### Step 3: Run Bug Correlation (Dry-Run) - -**Goal**: Search Jira for existing bugs matching each failure. - -**Actions**: -1. **IMPORTANT**: Wait until ALL analysis agents from Step 2 are confirmed complete -2. For each release version, launch `analyze-ci:create-bugs` in dry-run mode as an **Agent**: - ```text - Agent: subagent_type=general_purpose, prompt="Run /analyze-ci:create-bugs " - ``` -3. If rebase PR analysis produced job files, also launch `analyze-ci:create-bugs` for rebase PRs (check the PR jobs JSON to identify rebase PR source identifiers like `rebase-release-4.22`): - ```text - Agent: subagent_type=general_purpose, prompt="Run /analyze-ci:create-bugs rebase-release-" - ``` -4. Launch all create-bugs agents **in parallel** -5. Wait until all create-bugs agents complete -6. Each agent produces `${WORKDIR}/analyze-ci-bugs-.json` - -**Error Handling**: -- If create-bugs fails for a release, note the failure but do not block other releases or HTML generation - -### Step 4: Finalize — Aggregate and Generate HTML Report - -**Goal**: Deterministically aggregate results and generate the HTML report. - -**Actions**: -1. Run the finalize script: - ```bash - bash .claude/scripts/analyze-ci-doctor.sh finalize --workdir ${WORKDIR} $ARGUMENTS - ``` -2. The script deterministically: - - Runs `analyze-ci-aggregate.py` for each release and for PRs → `summary.json` files - - Runs `analyze-ci-create-report.py` → `microshift-ci-doctor-report.html` -3. Report the script's output to the user - -### Step 5: Report Completion - -**Actions**: -1. Display the path to the generated HTML file -2. Summarize: failed job counts per release, rebase PR status, bug correlation results - -**Example Output**: -```text -Summary: - Periodics: - Release 4.19: 3 failed periodic jobs - Release 4.20: 7 failed periodic jobs - Release 4.21: 0 failed periodic jobs - Release 4.22: 12 failed periodic jobs - Pull Requests: - 2 rebase PRs with 5 total failed jobs - -HTML report generated: ${WORKDIR}/microshift-ci-doctor-report.html -``` - -## Examples - -### Example 1: Analyze Multiple Releases -```bash -/analyze-ci:doctor 4.19,4.20,4.21,4.22 -``` - -### Example 2: Analyze Two Releases -```bash -/analyze-ci:doctor 4.21,4.22 -``` - -### Example 3: Single Release (still produces HTML) -```bash -/analyze-ci:doctor 4.22 -``` - -## Prerequisites - -- `gcloud` CLI must be installed and authenticated for GCS access -- `gh` CLI must be authenticated with access to openshift/microshift -- MCP Jira server must be configured (for bug correlation) -- Internet access to fetch job data from Prow/GCS -- Bash shell, Python 3 - -## Related Skills - -- **analyze-ci:prow-job**: Single job analysis (used by Step 2 agents) -- **analyze-ci:create-bugs**: Bug correlation and creation (used in Step 3; can also be run with `--create` after this command) - -## Notes -- **Deterministic scripts** handle: data collection, artifact download, aggregation, HTML generation -- **LLM agents** handle: per-job root cause analysis (Step 2), Jira bug search (Step 3) -- All agents (all releases + PRs) are launched in a single parallel wave — no per-release agents -- The `prepare` script downloads all artifacts upfront so prow-job agents use local paths (no redundant downloads) -- The `finalize` script runs aggregation and HTML generation in one call -- All intermediate files use prescribed filenames in `${WORKDIR}` — no improvised names -- The HTML report is self-contained (no external CSS/JS dependencies) -- If a release analysis fails, it is noted in the report but does not block other releases -- If no rebase PRs are open, the Pull Requests tab shows "No open rebase pull requests found" diff --git a/.claude/commands/analyze-ci/prow-job.md b/.claude/commands/analyze-ci/prow-job.md deleted file mode 100644 index 2cee1812bf..0000000000 --- a/.claude/commands/analyze-ci/prow-job.md +++ /dev/null @@ -1,195 +0,0 @@ ---- -name: Analyze CI for a Prow Job -argument-hint: -description: Download Prow job artifacts, identify root cause of failure, and produce a structured error report -allowed-tools: Skill, Bash, Read, Write, Glob, Grep, Agent ---- - -# analyze-ci:prow-job - -## Synopsis -```bash -/analyze-ci:prow-job -/analyze-ci:prow-job -``` - -## Description -Analyzes a single Prow CI test job by scanning artifacts for errors and producing a structured failure report. Accepts either a Prow job URL (downloads artifacts) or a local directory path (uses pre-downloaded artifacts). - -## Arguments -- `$ARGUMENTS` (required): Either a job URL or a local artifacts directory path: - - **Prow URL**: `https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.21-periodics-e2e-aws-ovn-ocp-conformance-serial/1984108354347208704` - - **GCS web URL**: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.21-periodics-e2e-aws-ovn-ocp-conformance-serial/1984108354347208704` - - **Local artifacts directory**: `/tmp/analyze-ci-claude-workdir.260404/artifacts/1984108354347208704` (must contain `build-log.txt` and `finished.json`) - -## Goal -Reduce noise for developers by processing large logs from a CI test pipeline and correctly classifying fatal errors with a false-positive rate of 0.01% and false-negative rate of 0.5%. - -## Audience -Software Engineer - -## Glossary - -- **ci-config**: Top level configuration file specifying build inputs, versions, and test workflows to execute. Periodic tests are suffixed with `__periodic.yaml`. -- **test**: The set of configurations and commands that specify how to execute the test. Can be defined in-line in ci-config, or as individual "steps" (see below). -- **step-registry**: Root directory where all openshift-ci test step configs and commands are stored. -- **step**: Smallest component of the test infrastructure. A step yaml specifies the command or script to execute, environmental variables and default values, and step metadata. Also called "ref" or "step ref". -- **chain**: A yaml configuration specifying 1 or more steps or chains in an array. Steps and chains are exploded and executed serially by index. May override step environment variable values. -- **workflow**: A yaml configuration specifying 1 or more steps, chains, or workflows in an array. Steps, chains, and workflows are exploded and executed serially. May override chain or step environmental variable values. Typically referenced by a test in a ci-config. -- **scenario**: MicroShift integration tests are built on the robotframework test framework. A "scenario" represents the RF suite, the test's environment, the microshift deployment, and the virtual machine on which the entire testing process takes place. Scenarios also include the manner of deployment: rpm-ostree, rpm installation, or bootc container. - -## Job Name and Job ID - -The Job Name and Job ID are encoded in the URL. There are two URL formats depending on the job type: - -**Periodic/postsubmit jobs:** -``` -https://prow.ci.openshift.org/view/gs/test-platform-results/logs/{JOB_NAME}/{JOB_ID} -https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/{JOB_NAME}/{JOB_ID} -``` -GCS path: `gs://test-platform-results/logs/{JOB_NAME}/{JOB_ID}/` - -**Presubmit (PR) jobs:** -``` -https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_microshift/{PR_NUMBER}/{JOB_NAME}/{JOB_ID} -https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_microshift/{PR_NUMBER}/{JOB_NAME}/{JOB_ID} -``` -GCS path: `gs://test-platform-results/pr-logs/pull/openshift_microshift/{PR_NUMBER}/{JOB_NAME}/{JOB_ID}/` - -To determine the GCS path from any job URL, strip the web prefix and replace with `gs://`: -- Prow URL: strip `https://prow.ci.openshift.org/view/gs/` -- GCS web URL: strip `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/` - -## Important Files -> These files are available after artifacts are downloaded (via the download script or workflow step 0). -- `${TMP}/build-log.txt`: Log containing prow job output and most likely place to identify AWS infra related or hypervisor related errors. -- `${STEP}/build-log.txt`: Each step in the CI job is individually logged in a build-log.txt file. -- `${TMP}/artifacts/${TEST_NAME}/openshift-microshift-infra-sos-aws/artifacts/sosreport-*.tar.xz`: Compressed archive containing select portions of the test host's filesystem, relevant logs, and system configurations. `${TEST_NAME}` varies by job (e.g., `e2e-aws-tests`, `e2e-aws-ovn-ocp-conformance-arm64`). -- `${TMP}/artifacts/${TEST_NAME}/openshift-microshift-e2e-origin-conformance/build-log.txt`: Step-specific build log for origin conformance tests. - -## Important Links - -**Step Diagram URL** (found at the end of the main build-log): -``` -https://steps.ci.openshift.org/job?org=openshift&repo=microshift&branch=release-4.19&test=e2e-aws-tests-bootc-nightly&variant=periodics -``` -This link provides a diagram of the steps that make up the test. Think about reading this diagram when identifying step failures because not all fatal errors cause the current step to fail but may cause the next step to fail. - -**SOS Report** (contains a cross-section of the test host's filesystem, including the microshift journal and container logs) - -After downloading artifacts locally, find the SOS report at: -``` -${TMP}/artifacts/${TEST_NAME}/openshift-microshift-infra-sos-aws/artifacts/sosreport-*.tar.xz -``` -Where `${TEST_NAME}` is the test name directory (e.g., `e2e-aws-tests`, `e2e-aws-ovn-ocp-conformance-serial`). Use `find ${TMP}/artifacts -name 'sosreport-*.tar.xz'` to locate it. - -## Work Directory - -Set once at the start and reference throughout: -```bash -WORKDIR=/tmp/analyze-ci-claude-workdir.$(date +%y%m%d) -mkdir -p ${WORKDIR} -``` - -## Common Commands - -Scan the build log for arbitrary text: -```bash -grep '${SOME_TEXT}' ${GREP_OPTS} ${TMP}/build-log.txt -``` - -Download all prow job artifacts (only needed when given a URL, not a local path): -```bash -GCS_PATH=$(echo "${PROW_URL}" | sed -e 's|https://prow.ci.openshift.org/view/gs/|gs://|' -e 's|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/|gs://|') -gcloud storage cp -r "${GCS_PATH}/" ${TMP}/ -``` - -## Workflow - -The user argument is: $ARGUMENTS - -0. **Determine input type and set up artifacts directory**: - - If `$ARGUMENTS` is a **local directory path** (starts with `/` and contains `build-log.txt`): set `TMP` to that directory. Skip step 1. - - If `$ARGUMENTS` is a **URL** (starts with `http`): create a temporary working directory with `mktemp -d ${WORKDIR}/openshift-ci-analysis-XXXX`, set `TMP` to that directory, and proceed to step 1. - -1. **Download all artifacts** (skip if using pre-downloaded artifacts from step 0): - Download all prow job artifacts using `gcloud storage cp -r` into the temporary working directory. Derive the GCS path by stripping the web prefix from the job URL (handles both Prow and GCS web URL formats): - ```bash - GCS_PATH=$(echo "${PROW_URL}" | sed -e 's|https://prow.ci.openshift.org/view/gs/|gs://|' -e 's|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/|gs://|') - gcloud storage cp -r "${GCS_PATH}/" ${TMP}/ - ``` - This works for both periodic (`logs/...`) and presubmit PR (`pr-logs/pull/...`) job URLs, and for both Prow and GCS web URL formats. - This makes all build logs, step logs, and SOS reports available locally for analysis. - -2. **Scan for errors**: Start by scanning the top level `build-log.txt` file for errors and determine the step where the error occurred. Record each error with the filepath and line number for later reference. - -3. **Read context**: Iterate over each recorded error, locate the log file and line number, then read 50 lines before and 50 lines after the error. Use this information to characterize the error. Think about whether this error is transient and think about where in the stack the error occurs. Does it occur in the cloud infra, the openshift or prow ci-config, the hypervisor, or is it a legitimate test failure? If it is a legitimate test failure, determine what stage of the test failed: setup, testing, teardown. - -4. **Analyze the error**: Based on the context of the error, think hard about whether this error caused the test to fail, is a transient error, or is a red herring. - - 4.1 If it is a legitimate test error, analyze the test logs to determine the source of the error. - 4.2 If the source of the error appears to be due to microshift or a workload running on microshift, analyze the sos report's microshift journal and pod logs. - -5. **Produce a report**: Create a concise report of the error. The report MUST specify: - - Where in the pipeline the error occurred - - The specific step the error occurred in - - Whether the test failure was legitimate (i.e., a test failed) or due to an infrastructure failure (i.e., build image was not found, AWS infra failed due to quota, hypervisor failed to create test host VM, etc.) - -## Prerequisites - -- `gcloud` CLI must be installed and authenticated for GCS access -- Internet access to fetch job data from Prow/GCS -- Bash shell - -## Tips - -1. There are many setup and teardown stages so fatal errors may be buried by log output from the teardown phase. It is not common to find the fatal error at the end of the log. -2. You can quickly determine the failed step from the build-log.txt by reading the last `Running step e2e-aws-tests-bootc-nightly-openshift-microshift-e2e-metal-tests` line before the container logs appear. - -## Output Template - -Use this template for your error analysis reports: - -``` -Error Severity: {1-5} -Stack Layer: {AWS Infra, External Infrastructure, build phase, deploy phase, test setup phase, Test Configuration, test, teardown} -Step Name: {The specific step where the error occurred} -Error: {The exact error, including additional log context if it relates to the failure} -Suggested Remediation: {Based on where the error occurs, think hard about how to correct the error ONLY if it requires fixing. Infrastructure failures may not require code changes.} -``` - -After the human-readable report above, append a machine-readable block for downstream automation. This block MUST appear at the very end of the report, after all prose and analysis: - -```text ---- STRUCTURED SUMMARY --- -SEVERITY: {1-5, same as Error Severity above} -STACK_LAYER: {AWS Infra, External Infrastructure, build phase, deploy phase, test setup phase, Test Configuration, test, teardown - same as Stack Layer above} -STEP_NAME: {same as Step Name above} -ERROR_SIGNATURE: {a concise, unique one-line description of the root cause - not the full error, just enough to identify and deduplicate this failure} -RAW_ERROR: {the primary error message copied VERBATIM from the log file - see rules below} -INFRASTRUCTURE_FAILURE: {true if Stack Layer is AWS Infra or the failure is due to CI infrastructure rather than product code, false otherwise} -JOB_URL: {the full prow job URL — when given a URL as input, use it directly; when given a local artifacts dir, reconstruct from the build-log.txt "Link to job on registry info site" line or from the directory path structure} -JOB_NAME: {the full job name — extract from the JOB_URL path, or from the build-log.txt "Running step" lines, or from the artifacts directory structure} -RELEASE: {the release branch — extract from JOB_NAME (e.g. 4.22 from release-4.22), or from finished.json metadata repos field, or default to "main"} -FINISHED: {the job finish date in YYYY-MM-DD format, extracted from finished.json timestamp field or build log timestamps} ---- END STRUCTURED SUMMARY --- -``` - -### RAW_ERROR rules - -The `RAW_ERROR` field is used by downstream scripts for deterministic grouping. Two runs analyzing the same job MUST produce the same RAW_ERROR. Keep it simple — fewer rules mean less room for variation. - -1. **Copy-paste the exact error text** from the log — do NOT paraphrase, summarize, or reword -2. **Pick only ONE error** — the primary error that caused the step to fail. If multiple errors exist, pick the first fatal one. -3. **Only strip timestamps** — remove leading timestamps like `2026-04-01T06:21:48Z`. Keep everything else verbatim, including prefixes like `An error occurred...` or `error:`. -4. **Never concatenate multiple errors** — pick ONE error, not a semicolon-separated list -5. **Truncate to ~150 characters** if the raw message is very long — keep the distinctive part - -Examples of good RAW_ERROR values (copied verbatim from logs): -- `An error occurred (InvalidClientTokenId) when calling the CreateStack operation: The security token included in the request is invalid.` -- `panic: runtime error: index out of range [6] with length 6` -- `Process did not finish before 4h0m0s timeout` -- `error: the server doesn't have a resource type "clusterversion"` -- `package github.com/opencontainers/runc/libcontainer/cgroups: module github.com/opencontainers/runc@latest found, but does not contain package` - -The ERROR_SIGNATURE field remains as a human-readable description for reports and Jira bug titles. diff --git a/.claude/commands/analyze-ci/test-job.md b/.claude/commands/analyze-ci/test-job.md deleted file mode 100644 index fe51a1f78b..0000000000 --- a/.claude/commands/analyze-ci/test-job.md +++ /dev/null @@ -1,416 +0,0 @@ ---- -name: Analyze CI Test Job -argument-hint: -description: Analyze a MicroShift Prow CI Test Job execution -allowed-tools: Skill, WebFetch, Bash, Read, Write, Glob, Grep ---- - -## Name -analyze-ci:test-job - -## Synopsis -```bash -/analyze-ci:test-job -``` - -## Description -The `analyze-ci:test-job` command fetches comprehensive information from a Prow CI job execution and displays it in both JSON and Markdown formats. - -This command provides: -- Job metadata (status, timing, architecture, image type) -- MicroShift version being tested -- Test scenarios executed and their results -- Build information -- Links to logs and artifacts - -This command is useful for understanding what was tested in a specific job run, identifying failures, and accessing detailed logs and artifacts. - -## Implementation - -This command works by: - -1. **Parsing the job URL** to extract job name, ID, and configuration (architecture, image type, version) -2. **Fetching job metadata** from `finished.json` and `started.json` to get status, timing, and result information -3. **Extracting MicroShift version** using the `extract_microshift_version.py` helper script from build logs -4. **Listing test scenarios** by fetching the scenario-info directory structure from GCS artifacts -5. **Analyzing test results** for each scenario using the `analyze-ci:test-scenario` command to get comprehensive JSON data -6. **Compiling artifacts and logs** by constructing URLs to build logs, test execution logs, and failure diagnostics -7. **Generating a detailed Markdown report** with job overview, version info, scenario results, and artifact links - -The command integrates with the `analyze-ci:test-scenario` command to provide detailed per-scenario analysis and aggregates all information into a human-readable report with proper formatting (status icons, duration calculations, failure summaries). - -## Arguments -- `$1` (job-url): URL to the Prow CI job - **Required** - - Formats accepted: - - Full Prow dashboard URL: `https://prow.ci.openshift.org/view/gs/test-platform-results/logs//` - - GCS web URL: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs//` - - Job ID only (e.g., "1979744605507162112") - will attempt to infer job type from context - -## Return Value -- **Format**: Markdown -- **Location**: Output directly to the conversation -- **Content**: - - Job overview (status, timing, configuration) - - MicroShift version details - - Test scenario results - - Build information - - Links to logs and artifacts - -## Implementation Steps - -### Step 1: Parse Arguments and Validate Job URL - -**Goal**: Extract the job name, job ID, and job configuration. - -**Actions**: -1. Parse the job URL to extract: - - Job name (e.g., "periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic") - - Job ID (e.g., "1979744605507162112") -2. Determine job configuration from job name: - - Architecture: x86_64 or aarch64 (look for "arm" in job name) - - Image type: bootc or rpm-ostree (look for "bootc" in job name) - - Version: extract from job name (e.g., "4.20") -3. Validate URL format -4. If only job ID provided, ask user for job type or attempt to determine from recent jobs - -**Example Parsing**: -``` -URL: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic/1979744605507162112 - -Extracted: -- job_name: "periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic" -- job_id: "1979744605507162112" -- version: "4.20" -- arch: "x86_64" -- image_type: "bootc" -``` - -### Step 2: Fetch Job Metadata - -**Goal**: Get job information (status, timing, result). - -**Actions**: -1. Construct the GCS URL for the `finished.json` file: - ``` - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs///finished.json - ``` -2. Fetch the `finished.json` file using curl or WebFetch -3. Parse the JSON to extract: - - Job result (SUCCESS/FAILURE/ABORTED) - - Timestamp (start/end times) - - Duration - - Passed status - - Metadata (repo, revision, etc.) -4. Fetch `started.json` for additional metadata: - ```bash - curl -s "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs///started.json" - ``` - -### Step 3: Extract MicroShift Version - -**Goal**: Determine the exact MicroShift version being tested. - -This command includes a Python script that automates version extraction from test logs. - -**Script Location**: `.claude/scripts/extract_microshift_version.py` - -**Usage**: -```bash -python3 .claude/scripts/extract_microshift_version.py -``` - -**Arguments**: -- `prow_url`: The full Prow CI job URL (e.g., "https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic/1979744605507162112") -- `scenario`: The test scenario name (e.g., "el96-lrel@ipv6") - -**Example**: -```bash -# Extract version for a specific job and scenario -python3 .claude/scripts/extract_microshift_version.py "https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic/1979744605507162112" "el96-lrel@ipv6" -``` - -**Output** (JSON): -```json -{ - "success": true, - "version": "4.20.0-202510161342.p0.g17d1d9a.assembly.4.20.0.el9.x86_64", - "build_type": "zstream", - "url": "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/...", - "error": null -} -``` - -**Build Types Detected**: -- `"nightly"`: Nightly development builds -- `"ec"`: Engineering Candidate -- `"rc"`: Release Candidate -- `"zstream"`: Stable/zstream release - -### Step 4: List Test Scenarios - -**Goal**: Find all test scenarios executed in this job and list them. - -**Output** (JSON): -```json -{ - "job_id": "1979744605507162112", - "scenarios": [ - "el94-y2@el96-lrel@standard1", - "el96-lrel@standard1", - "el96-lrel@lvm", - "el96-lrel@dual-stack", - "el96-lrel@ipv6" - ], - "total_scenarios": 5 -} -``` - -### Step 5: Analyze Test Results for Each Scenario - -**Goal**: Get detailed test execution results for each scenario. - -**Method**: Use the `analyze-ci:test-scenario` command for each scenario to get comprehensive JSON data. - -**Actions**: -For each scenario found in Step 4: - -1. **Get scenario details** using the analyze-ci:test-scenario command: - ```bash - /analyze-ci:test-scenario - ``` - -2. **Parse the JSON response** which includes: - - Test results summary (total, passed, failed, errors, skipped) - - Individual test case details - - Failure messages and details (if any) - - Scenario configuration (RHEL version, test category) - - Execution timing - - Links to all artifacts - -**Example JSON Response**: -```json -{ - "scenario": { - "name": "el96-lrel@standard1", - "description": "RHEL 9.6 Latest Release - Standard Tests", - "configuration": { - "rhel_version": "9.6", - "release_type": "latest", - "test_category": "Standard Tests" - } - }, - "test_results": { - "status": "passed", - "summary": { - "total": 65, - "passed": 65, - "failed": 0, - "errors": 0, - "skipped": 0 - }, - "execution_time_seconds": 1234.56, - "test_cases": [ - { - "name": "MicroShift boots successfully", - "status": "passed" - } - ], - "failures": [] - }, - "artifacts": { - "junit_xml": "https://...", - "boot_log": "https://...", - "debug_log": "https://..." - } -} -``` - -3. **Extract key information** from each scenario: - - Overall status (passed/failed) - - Test counts - - Failure details (for failed scenarios) - - Execution time - - Test category and configuration - -**Alternative Manual Method** (if analyze-ci:test-scenario command unavailable): -1. Fetch junit.xml directly from artifact URL -2. Parse XML to extract test counts -3. Check boot_and_run.log for execution details -4. Extract scenario metadata from directory structure - -### Step 6: Compile Artifacts and Logs - -**Goal**: Provide links to useful artifacts and logs. - -**Actions**: -1. Compile key artifact URLs: - - Build log: `artifacts//openshift-microshift-infra-iso-build/build-log.txt` - - Test logs for each scenario - - JUnit XML reports - - Any failure logs or sosreports - -2. Categorize artifacts by type: - - Build artifacts - - Test execution logs - - Failure diagnostics - - System information - -### Step 7: Generate Detailed Report - -**Goal**: Create a comprehensive, well-structured report. - -**Report Structure**: -```markdown -# MicroShift CI Job Details - -## Job Overview -- **Job ID**: -- **Job Name**: -- **Status**: ✓ SUCCESS / ✗ FAILURE / ⚠️ ABORTED -- **Architecture**: x86_64 / aarch64 -- **Image Type**: bootc / rpm-ostree -- **Duration**: Xh Ym Zs -- **Started**: YYYY-MM-DD HH:MM:SS UTC -- **Finished**: YYYY-MM-DD HH:MM:SS UTC - -## MicroShift Version -- **Full Version**: -- **Build Type**: nightly / RC / EC / stable -- **Base Version**: X.Y.Z -- **Commit**: -- **Build Timestamp**: YYYY-MM-DD-HHMMSS - -## Test Scenarios - -### Scenario: -- **Description**: -- **Status**: ✓ PASS / ✗ FAIL -- **Tests**: X passed, Y failed, Z skipped -- **Duration**: Xm Ys - -**Failures** (if any): -- Test: - - Error: - - Log: [View]() - -[Repeat for each scenario] - -## Build Information -- **Build Status**: SUCCESS / FAILURE -- **Build Log**: [View]() -- **Build Duration**: Xm Ys - -## Artifacts & Logs -- [Build Log]() -- [Test Execution Logs]() -- [Scenario Details]() -- [Full Artifacts]() - -## Links -- [View on Prow CI]() -- [Browse All Artifacts]() -``` - -### Step 8: Error Handling - -**Goal**: Handle errors gracefully. - -**Common Issues**: -1. **Job not found (404)**: - - Verify job ID is correct - - Check if job is still running (no finished.json yet) - - Provide helpful error message to user - - Handle network errors gracefully - -2. **Artifacts not available**: - - Some jobs may not have all artifacts - - Gracefully handle missing files - - Indicate which artifacts are unavailable in the report - -3. **Invalid job URL**: - - Validate URL format before making requests - - Handle malformed URLs - - Provide examples of valid formats - - Suggest using job ID from Prow job URL - -4. **Version extraction failures**: - - Handle cases where version cannot be determined - - Provide partial information if available - - Include error message in report - -## Examples - -### Example 1: Successful Job Analysis -``` -/analyze-ci:test-job https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic/1979744605507162112 -``` - -Output: -```markdown -# MicroShift CI Job Details - -## Job Overview -- **Job ID**: 1979744605507162112 -- **Job Name**: periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic -- **Status**: ✓ SUCCESS -- **Architecture**: x86_64 -- **Image Type**: bootc -- **Duration**: 1h 43m 40s -- **Started**: 2025-10-19 03:01:17 UTC -- **Finished**: 2025-10-19 04:44:57 UTC - -## MicroShift Version -- **Full Version**: 4.20.0-0.nightly-2025-10-15-110252-20251017171355-4ad30ab2d -- **Build Type**: nightly -- **Base Version**: 4.20.0 -- **Commit**: 4ad30ab2d -- **Build Date**: 2025-10-15 - -## Test Scenarios - -### Scenario: el96-lrel@standard1 -- **Description**: RHEL 9.6 Latest Release - Standard Tests -- **Status**: ✓ PASS -- **Tests**: 45 passed, 0 failed, 2 skipped - -[Additional sections...] -``` - -### Example 2: Using GCS Web URL -``` -/analyze-ci:test-job https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-release-arm-periodic/1979744608019550208 -``` - -### Example 3: Failed Job Analysis -```bash -/analyze-ci:test-job https://prow.ci.openshift.org/view/gs/test-platform-results/logs/some-failing-job/9876543210 -``` - -Output would include failure details: -```markdown -## Job Overview -- **Status**: ✗ FAILURE -... - -## Test Scenarios -... -**Failures**: -- Test: - - Error: - - Log: [View]() -``` - -### Example 4: Job ID Only -``` -/analyze-ci:test-job 1979744605507162112 -``` -(May prompt for additional context or attempt to determine job type from recent jobs) - -## Notes -- This command provides comprehensive analysis including job status, MicroShift version, test scenarios, and detailed results -- Works with MicroShift-specific Prow CI jobs -- Requires internet access to fetch job data from Prow CI -- All times are displayed in UTC -- Duration is calculated from the finished.json timestamp and start time from started.json -- The command is read-only and does not modify any CI job data -- Useful for debugging specific test failures or understanding what was tested diff --git a/.claude/commands/analyze-ci/test-scenario.md b/.claude/commands/analyze-ci/test-scenario.md deleted file mode 100644 index 9dcfabf913..0000000000 --- a/.claude/commands/analyze-ci/test-scenario.md +++ /dev/null @@ -1,408 +0,0 @@ ---- -name: Analyze CI Test Scenario -argument-hint: -description: Analyze MicroShift Test Scenario results -allowed-tools: WebFetch, Bash, Read, Write, Glob, Grep ---- - - -## Name -analyze-ci:test-scenario - -## Synopsis -```bash -/analyze-ci:test-scenario -``` - -## Description -The `analyze-ci:test-scenario` command retrieves comprehensive information about a specific test scenario executed within a MicroShift CI job. It returns detailed information containing: -- Scenario configuration (OS version, test type, architecture) -- Test execution results (pass/fail counts, test names) -- MicroShift version tested -- Execution timing -- Links to logs and artifacts -- Test failure details (if any) - -This command is useful for detailed investigation of specific test scenarios and understanding test execution results. - -## Implementation - -This command works by: - -1. **Parsing the job URL** to extract job metadata (ID, name, version, architecture, image type) -2. **Constructing artifact URLs** for the specified scenario in the GCS bucket -3. **Fetching JUnit XML** test results using curl/WebFetch from the scenario's artifact directory -4. **Parsing test results** to extract pass/fail counts, test case names, and failure details -5. **Extracting scenario metadata** from the scenario name (RHEL version, release type, test category) -6. **Compiling artifact links** for all logs and diagnostic files -7. **Generating formatted Markdown output** containing all collected information - -If no scenario name is provided it will prompt to the user what scenario to use. - -The command uses the `.claude/scripts/extract_microshift_version.py` helper script to determine the exact MicroShift version tested in the scenario. - -## Arguments -- `$1` (job-url): URL to the Prow CI job - **Required** - - Formats accepted: - - Full Prow dashboard URL: `https://prow.ci.openshift.org/view/gs/test-platform-results/logs//` - - GCS web URL: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs//` -- `$2` (scenario-name): Name of the scenario to analyze - **Required** - - Examples: `el96-lrel@standard1`, `el96-lrel@lvm`, `el96-lrel@dual-stack` - - If not provided, the command will list all available scenarios - -## Return Value -- **Format**: Markdown -- **Location**: Output directly to the conversation -- **Content**: Comprehensive scenario information including test results, configuration, and artifacts - -## Implementation Steps - -### Step 1: Parse and Validate Input - -**Goal**: Extract job information and scenario name from the arguments. - -**Actions**: -1. Parse the job URL to extract: - - Job name - - Job ID - - Version (e.g., "4.20") - - Job type (bootc/rpm-ostree, x86_64/aarch64) -2. Validate scenario name format (should match pattern: `el[0-9]+-[a-z0-9]+@.+`) -3. If no scenario name provided, set `list_scenarios = true` flag - -**Example**: -```javascript -// Input -job_url = "https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic/1979744605507162112" -scenario_name = "el96-lrel@standard1" - -// Parsed -job_id = "1979744605507162112" -job_name = "periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic" -version = "4.20" -job_type = "e2e-aws-tests-bootc-release-periodic" -arch = "x86_64" -image_type = "bootc" -``` - -### Step 2: Construct Artifact URLs - -**Goal**: Build URLs to the scenario's artifacts. - -**Actions**: -1. Construct base artifact URL: - ``` - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs///artifacts//openshift-microshift-e2e-metal-tests/artifacts/scenario-info// - ``` -2. Construct specific artifact URLs: - - JUnit XML: `/junit.xml` - - Boot log: `/boot_and_run.log` - - Debug log: `/rf-debug.log` - - Phase logs: `/phase_*/*.log` - -### Step 3: List Available Scenarios (if no scenario specified) - -**Goal**: If no scenario name was provided, list all available scenarios in the job. - -**Actions**: -1. Fetch the scenario-info directory listing: - ``` - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs///artifacts//openshift-microshift-e2e-metal-tests/artifacts/scenario-info/ - ``` -2. Use WebFetch to parse the HTML directory listing -3. Extract all scenario directory names -4. Return formatted list of scenarios - -**Output Format** (if listing scenarios): -``` -# Available Test Scenarios - -Job: periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic -Job ID: 1979744605507162112 - -## Scenarios (11 total) -- el94-y2@el96-lrel@standard1 -- el94-y2@el96-lrel@standard2 -- el96-lrel@ai-model-serving-online -- el96-lrel@dual-stack -- el96-lrel@ginkgo-tests -- el96-lrel@ipv6 -- el96-lrel@low-latency -- el96-lrel@lvm -- el96-lrel@multi-nic -- el96-lrel@standard1 -- el96-lrel@standard2 -``` - -### Step 4: Fetch and Parse JUnit XML - -**Goal**: Get test execution results from JUnit XML. - -**Actions**: -1. Fetch the junit.xml file using curl or WebFetch -2. Parse XML to extract: - - Total test count - - Passed tests count - - Failed tests count - - Skipped tests count - - Error count - - Test execution time - - Individual test case names and statuses - - Failure messages and stack traces (if any) - -**Example Parsing**: -```python -import xml.etree.ElementTree as ET - -root = ET.fromstring(xml_content) -testsuite = root.find('.//testsuite') - -test_results = { - 'total': int(testsuite.get('tests', '0')), - 'passed': 0, - 'failures': int(testsuite.get('failures', '0')), - 'errors': int(testsuite.get('errors', '0')), - 'skipped': int(testsuite.get('skipped', '0')), - 'time': float(testsuite.get('time', '0')), - 'test_cases': [] -} - -for testcase in testsuite.findall('.//testcase'): - name = testcase.get('name') - status = 'passed' - message = None - - if testcase.find('failure') is not None: - status = 'failed' - message = testcase.find('failure').get('message') - elif testcase.find('error') is not None: - status = 'error' - message = testcase.find('error').get('message') - elif testcase.find('skipped') is not None: - status = 'skipped' - - test_results['test_cases'].append({ - 'name': name, - 'status': status, - 'message': message - }) - -test_results['passed'] = test_results['total'] - test_results['failures'] - test_results['errors'] - test_results['skipped'] -``` - -### Step 5: Extract Scenario Metadata - -**Goal**: Parse scenario name to extract configuration details. - -**Actions**: -1. Parse scenario name to extract components: - - RHEL version (e.g., "el96" → "RHEL 9.6") - - Release type (e.g., "lrel" → "Latest Release") - - Test type (e.g., "standard1", "lvm", "dual-stack") - - Upgrade path (if format is `el94-y2@el96-lrel@...` → upgrade from 9.4 to 9.6) - -2. Determine test category from test type: - - `standard1`, `standard2` → "Standard Tests" - - `lvm` → "LVM Storage Tests" - - `dual-stack` → "Dual-Stack Networking Tests" - - `ipv6` → "IPv6 Networking Tests" - - `multi-nic` → "Multi-NIC Configuration Tests" - - `low-latency` → "Low-Latency Tests" - - `ginkgo-tests` → "Ginkgo Integration Tests" - - `ai-model-serving-online` → "AI Model Serving Tests" - -**Example**: -```javascript -// Scenario: el96-lrel@standard1 -{ - "rhel_version": "9.6", - "release_type": "latest", - "test_category": "Standard Tests", - "test_variant": "1", - "is_upgrade": false -} - -// Scenario: el94-y2@el96-lrel@standard1 -{ - "source_rhel_version": "9.4", - "target_rhel_version": "9.6", - "release_type": "latest", - "test_category": "Standard Tests", - "test_variant": "1", - "is_upgrade": true -} -``` - -### Step 6: Get Execution Timing - -**Goal**: Extract when the scenario was executed and how long it took. - -**Actions**: -1. Check boot_and_run.log for timestamps -2. Look for start and end markers in the log -3. Calculate duration if both timestamps available -4. Extract from junit.xml `time` attribute as fallback - -### Step 7: Compile Artifact Links - -**Goal**: Provide direct links to all relevant artifacts for the scenario. - -**Actions**: -1. Build URLs for common artifacts: - - JUnit XML report - - Boot and run log - - Debug log - - Phase logs (if they exist) - - Sosreport (if test failed) - -2. Categorize artifacts: - - Test results: junit.xml - - Execution logs: boot_and_run.log, rf-debug.log - - Phase logs: All logs under phase_* directories - - Diagnostics: sosreports, system logs - - -### Error Handling - -**Common Issues and Responses**: - -1. **Scenario not found**: -``` -# Error: Scenario Not Found - -Scenario 'el96-lrel@invalid' does not exist in job 1979744605507162112 - -## Available Scenarios -- el96-lrel@standard1 -- el96-lrel@lvm -- ... -``` - -2. **Job not found**: -``` -# Error: Job Not Found - -Could not fetch artifacts for job ID 1234567890 - -Please verify the job URL and ensure the job has completed. -``` - -3. **Missing artifacts**: -``` -# Warning: Partial Data Available - -Some artifacts were not available for this scenario. - -## Missing Artifacts -- junit.xml - -Displaying available information below... -``` - -## Examples - -### Example 1: Get scenario information -``` -/analyze-ci:test-scenario https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic/1979744605507162112 el96-lrel@standard1 -``` - -Output: -``` -# Test Scenario Analysis: el96-lrel@standard1 - -## Job Information -- **Job Name**: periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic -- **Job ID**: 1979744605507162112 -- **Version**: 4.20 -- **Architecture**: x86_64 -- **Image Type**: bootc - -## Scenario Configuration -- **Name**: el96-lrel@standard1 -- **Description**: RHEL 9.6 Latest Release - Standard Tests -- **RHEL Version**: 9.6 -- **Release Type**: Latest -- **Test Category**: Standard Tests -- **Upgrade Test**: No - -## Test Results -**Status**: PASSED - -### Summary -- **Total Tests**: 65 -- **Passed**: 65 -- **Failed**: 0 -- **Errors**: 0 -- **Skipped**: 0 - -## Artifacts -- [JUnit XML](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../junit.xml) -- [Boot and Run Log](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../boot_and_run.log) -``` - -### Example 2: List all scenarios (no scenario name provided) -``` -/analyze-ci:test-scenario https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic/1979744605507162112 -``` - -Output: -``` -# Available Test Scenarios - -Job: periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-release-periodic -Job ID: 1979744605507162112 - -## Scenarios (5 total) -- el96-lrel@standard1 -- el96-lrel@standard2 -- el96-lrel@lvm -- el96-lrel@dual-stack -- el96-lrel@ipv6 -``` - -### Example 3: Get information about a failed scenario -``` -/analyze-ci:test-scenario https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-release-periodic/1234567890 el96-lrel@lvm -``` - -Output would include failure details: -``` -# Test Scenario Analysis: el96-lrel@lvm - -## Job Information -- **Job Name**: periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-release-periodic -- **Job ID**: 1234567890 -- **Version**: 4.20 - -## Scenario Configuration -- **Name**: el96-lrel@lvm -- **Description**: RHEL 9.6 Latest Release - LVM Storage Tests -- **Test Category**: LVM Storage Tests - -## Test Results -**Status**: FAILED - -### Summary -- **Total Tests**: 45 -- **Passed**: 43 -- **Failed**: 2 -- **Errors**: 0 -- **Skipped**: 0 - -### Failed Tests -1. **LVM volume creation** - - **Error**: Failed to create LVM volume: insufficient space - - **Log**: [create-lvm.log](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../phase_create-and-run/create-lvm.log) - -## Artifacts -- [JUnit XML](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../junit.xml) -- [Boot and Run Log](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../boot_and_run.log) -- [Debug Log](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../rf-debug.log) -``` - -## Notes -- This command outputs detailed information in Markdown format for easy reading -- The command is read-only and does not modify any CI job data -- If a scenario doesn't have junit.xml, the command will attempt to infer results from logs -- The command caches scenario lists internally to avoid repeated fetches when listing scenarios -- Artifact links in the output are direct URLs to GCS storage for immediate access diff --git a/.claude/scripts/analyze-ci-aggregate.py b/.claude/scripts/analyze-ci-aggregate.py deleted file mode 100755 index 867fee0402..0000000000 --- a/.claude/scripts/analyze-ci-aggregate.py +++ /dev/null @@ -1,475 +0,0 @@ -#!/usr/bin/env python3 -""" -Aggregate per-job analysis reports into a release or PR summary JSON file. - -Reads per-job report files (containing STRUCTURED SUMMARY blocks and prose), -groups jobs by ERROR_SIGNATURE similarity, and produces JSON consumed by -analyze-ci-create-report.py. - -Usage: - analyze-ci-aggregate.py --release 4.22 [--workdir DIR] - analyze-ci-aggregate.py --prs [--workdir DIR] - -Output files: - analyze-ci-release--summary.json - analyze-ci-prs-summary.json -""" - -import json -import sys -import os -import re -import glob as glob_mod -from datetime import datetime, timezone - - -# --------------------------------------------------------------------------- -# Constants -# --------------------------------------------------------------------------- - -STOP_WORDS = frozenset({ - "the", "a", "an", "in", "on", "at", "to", "for", "of", "with", "by", - "is", "was", "are", "were", "be", "been", "and", "or", "not", "no", - "but", "from", "that", "this", "all", "has", "have", "had", "do", - "does", "did", "will", "would", "could", "should", "may", "might", -}) - -INFRA_LAYERS = {"aws infra", "external infrastructure"} -BUILD_LAYERS = {"build phase"} -SIMILARITY_THRESHOLD = 0.50 - - -# --------------------------------------------------------------------------- -# Parsing per-job report files -# --------------------------------------------------------------------------- - -def parse_structured_summary(filepath): - """Extract the STRUCTURED SUMMARY block from a per-job report file.""" - with open(filepath, "r") as f: - content = f.read() - - m = re.search( - r"--- STRUCTURED SUMMARY ---\n(.+?)\n--- END STRUCTURED SUMMARY ---", - content, re.DOTALL, - ) - if not m: - return None - - data = {} - for line in m.group(1).strip().split("\n"): - if ":" in line: - key, val = line.split(":", 1) - data[key.strip()] = val.strip() - - try: - severity = int(data.get("SEVERITY", "3")) - except ValueError: - severity = 3 - - return { - "severity": severity, - "stack_layer": data.get("STACK_LAYER", ""), - "step_name": data.get("STEP_NAME", ""), - "error_signature": data.get("ERROR_SIGNATURE", ""), - "raw_error": data.get("RAW_ERROR", ""), - "infrastructure_failure": data.get("INFRASTRUCTURE_FAILURE", "false").lower() == "true", - "job_url": data.get("JOB_URL", ""), - "job_name": data.get("JOB_NAME", ""), - "release": data.get("RELEASE", ""), - "finished": data.get("FINISHED", ""), - } - - -def parse_prose_fields(filepath): - """Extract Error: and Suggested Remediation: from report prose.""" - with open(filepath, "r") as f: - content = f.read() - - prose = content.split("--- STRUCTURED SUMMARY ---")[0] - - error = "" - m = re.search( - r"^Error:\s*(.+?)(?=\nSuggested Remediation:|\nError Severity:|\nStack Layer:|\nStep Name:|\n\n|\n---|\Z)", - prose, re.MULTILINE | re.DOTALL, - ) - if m: - error = " ".join(m.group(1).split()) - - remediation = "" - m = re.search( - r"^Suggested Remediation:\s*(.+?)(?=\n\n|\n---|\nError Severity:|\nStack Layer:|\nStep Name:|\Z)", - prose, re.MULTILINE | re.DOTALL, - ) - if m: - remediation = " ".join(m.group(1).split()) - - return error, remediation - - -# --------------------------------------------------------------------------- -# Grouping -# --------------------------------------------------------------------------- - -def _normalize_step_name(step_name): - """Extract the step ref from a fully-qualified Prow step name. - - Prow step names follow the pattern ``-`` - where the step ref typically starts with ``openshift-microshift-``. - The LLM sometimes includes the test-variant prefix, sometimes not, - which would cause identical steps to land in different buckets - during two-pass grouping. - - Examples: - "openshift-microshift-infra-aws-ec2" - → "openshift-microshift-infra-aws-ec2" - "e2e-aws-tests-bootc-arm-nightly-el10-openshift-microshift-infra-aws-ec2" - → "openshift-microshift-infra-aws-ec2" - "clusterbot-nightly-openshift-microshift-infra-aws-ec2" - → "openshift-microshift-infra-aws-ec2" - """ - m = re.search(r"(openshift-microshift-\S+)", step_name) - return m.group(1) if m else step_name - - -def _tokenize(text): - words = re.findall(r"[a-z0-9][a-z0-9_.-]*[a-z0-9]|[a-z0-9]", text.lower()) - return {w for w in words if w not in STOP_WORDS and len(w) >= 2} - - -def signature_similarity(sig_a, sig_b): - tokens_a = _tokenize(sig_a) - tokens_b = _tokenize(sig_b) - if not tokens_a or not tokens_b: - return 0.0 - return len(tokens_a & tokens_b) / min(len(tokens_a), len(tokens_b)) - - -def _grouping_text(job): - """Return the text used for similarity grouping. - - Prefers RAW_ERROR (verbatim log text, deterministic) over - ERROR_SIGNATURE (LLM-paraphrased, variable across runs). - Falls back to ERROR_SIGNATURE when RAW_ERROR is absent - (backward compatibility with older report files). - """ - return job.get("raw_error") or job.get("error_signature", "") - - -def _group_by_similarity(jobs): - """Group jobs by similarity of their grouping text. - - Uses RAW_ERROR when available (deterministic log text), - falling back to ERROR_SIGNATURE for older reports. - - A new job is compared against ALL existing members of each group, - not just the first. If any member exceeds the similarity threshold - the job joins that group. This makes grouping less sensitive to - insertion order and to phrasing variation — each member added to - a group acts as an additional reference point for future matches. - """ - groups = [] - for job in jobs: - sig = _grouping_text(job) - placed = False - for group in groups: - if any( - signature_similarity(sig, _grouping_text(member)) >= SIMILARITY_THRESHOLD - for member in group - ): - group.append(job) - placed = True - break - if not placed: - groups.append([job]) - return groups - - -def group_by_signature(jobs): - """Two-pass grouping: first by step_name, then by signature similarity. - - Grouping by step_name first prevents jobs from different CI steps - (e.g. conformance vs metal-tests) from being merged together even - when their error signatures share enough tokens to exceed the - similarity threshold. This makes the issue count deterministic - across runs where only the signature wording varies. - """ - # Pass 1: bucket by normalized step_name - by_step = {} - for job in jobs: - step = _normalize_step_name(job.get("step_name", "")) - by_step.setdefault(step, []).append(job) - - # Pass 2: within each step bucket, group by signature similarity - all_groups = [] - for step_jobs in by_step.values(): - all_groups.extend(_group_by_similarity(step_jobs)) - return all_groups - - -def classify_severity(group): - count = len(group) - if count >= 5: - return "CRITICAL" - if count >= 3: - return "HIGH" - if count >= 2: - return "MEDIUM" - return "LOW" - - -# Patterns for deterministic breakdown classification. -# These override the LLM's STACK_LAYER, because step names and error -# signatures are deterministic while STACK_LAYER varies across runs. -INFRA_STEP_PATTERNS = ("infra-aws", "infra-gcp", "infra-setup") -BUILD_STEP_PATTERNS = ("update-origin", "build-image", "iso-build") -BUILD_SIGNATURE_PATTERNS = ("update-origin", "build-image") - - -def classify_breakdown(stack_layer, step_name="", error_signature=""): - lower_step = step_name.lower() - lower_sig = error_signature.lower() - - # Step-name overrides — more reliable than LLM's STACK_LAYER - if any(k in lower_step for k in INFRA_STEP_PATTERNS): - return "infrastructure" - if any(k in lower_step for k in BUILD_STEP_PATTERNS): - return "build" - - # Error-signature overrides — catches build operations that run - # inside a test step (e.g. "make update-origin" in e2e-metal-tests) - if any(k in lower_sig for k in BUILD_SIGNATURE_PATTERNS): - return "build" - - # Fall back to LLM's classification - lower = stack_layer.lower() - if lower in INFRA_LAYERS: - return "infrastructure" - if lower in BUILD_LAYERS: - return "build" - return "test" - - -# --------------------------------------------------------------------------- -# JSON generation -# --------------------------------------------------------------------------- - -def build_release_json(release, jobs, timestamp): - """Build the release summary as a dict (ready for json.dump).""" - issues, breakdown = _build_issues_from_jobs(jobs) - - return { - "release": release, - "total_failed": len(jobs), - "date": timestamp.strftime("%Y-%m-%d"), - "breakdown": breakdown, - "issues": issues, - } - - -def _build_issues_from_jobs(jobs): - """Group jobs by error signature and return (issues list, breakdown dict). - - Shared by both release and PR builders. - """ - groups = group_by_signature(jobs) - groups.sort(key=lambda g: (-max(j["severity"] for j in g), -len(g), g[0].get("error_signature", ""))) - - breakdown = {"build": 0, "test": 0, "infrastructure": 0} - for job in jobs: - breakdown[classify_breakdown( - job["stack_layer"], - job.get("step_name", ""), - job.get("error_signature", ""), - )] += 1 - - issues = [] - for i, group in enumerate(groups, 1): - rep = max(group, key=lambda j: (j["severity"], j.get("job_name", ""))) - failure_type = classify_breakdown( - rep["stack_layer"], - rep.get("step_name", ""), - rep.get("error_signature", ""), - ) - issues.append({ - "number": i, - "title": rep["error_signature"], - "job_count": len(group), - "severity": classify_severity(group), - "failure_type": failure_type, - "root_cause": rep.get("error_text", ""), - "next_steps": rep.get("remediation_text", ""), - "affected_jobs": [ - {"name": j["job_name"], "date": j["finished"], "url": j["job_url"]} - for j in group - ], - }) - - return issues, breakdown - - -def build_pr_json(pr_jobs, timestamp): - """Build the PR summary as a dict (ready for json.dump). - - pr_jobs: dict mapping pr_number to list of job dicts. - """ - total_failed = sum(len(jobs) for jobs in pr_jobs.values()) - - prs = [] - for pr_number, jobs in sorted(pr_jobs.items()): - if not jobs: - continue - first = jobs[0] - issues, breakdown = _build_issues_from_jobs(jobs) - prs.append({ - "number": pr_number, - "title": first.get("pr_title", ""), - "url": first.get("pr_url", ""), - "failed": len(jobs), - "breakdown": breakdown, - "issues": issues, - }) - - return { - "total_prs": len(pr_jobs), - "prs_with_failures": len(prs), - "total_failed": total_failed, - "date": timestamp.strftime("%Y-%m-%d"), - "has_content": total_failed > 0, - "prs": prs, - } - - -# --------------------------------------------------------------------------- -# File discovery -# --------------------------------------------------------------------------- - -def find_release_job_files(workdir, release): - pattern = os.path.join(workdir, f"analyze-ci-release-{release}-job-*.txt") - return sorted(glob_mod.glob(pattern)) - - -def find_pr_job_files(workdir): - pattern = os.path.join(workdir, "analyze-ci-prs-job-*.txt") - return sorted(glob_mod.glob(pattern)) - - -# --------------------------------------------------------------------------- -# Main -# --------------------------------------------------------------------------- - -def main(): - workdir = None - release = None - mode = None - - args = sys.argv[1:] - i = 0 - while i < len(args): - if args[i] == "--workdir": - if i + 1 >= len(args): - print("Error: --workdir requires an argument", file=sys.stderr) - sys.exit(1) - workdir = args[i + 1] - i += 2 - elif args[i] == "--release": - if i + 1 >= len(args): - print("Error: --release requires a version", file=sys.stderr) - sys.exit(1) - mode = "release" - release = args[i + 1] - i += 2 - elif args[i] == "--prs": - mode = "prs" - i += 1 - elif args[i].startswith("-"): - print(f"Unknown option: {args[i]}", file=sys.stderr) - sys.exit(1) - else: - print(f"Unknown argument: {args[i]}", file=sys.stderr) - sys.exit(1) - - if mode is not None and args.count("--release") + args.count("--prs") > 1: - print("Error: --release and --prs are mutually exclusive", file=sys.stderr) - sys.exit(1) - - if mode is None: - print( - "Usage:\n" - " analyze-ci-aggregate.py --release [--workdir DIR]\n" - " analyze-ci-aggregate.py --prs [--workdir DIR]", - file=sys.stderr, - ) - sys.exit(1) - - if workdir is None: - workdir = f"/tmp/analyze-ci-claude-workdir.{datetime.now().strftime('%y%m%d')}" - - if not os.path.isdir(workdir): - print(f"Error: work directory does not exist: {workdir}", file=sys.stderr) - sys.exit(1) - - timestamp = datetime.now(timezone.utc) - - if mode == "release": - files = find_release_job_files(workdir, release) - if not files: - print(f"No job files found for release {release}", file=sys.stderr) - sys.exit(1) - - print(f"Found {len(files)} job files for release {release}", file=sys.stderr) - jobs = [] - for filepath in files: - summary = parse_structured_summary(filepath) - if summary is None: - print(f" WARNING: no STRUCTURED SUMMARY in {os.path.basename(filepath)}", file=sys.stderr) - continue - error_text, remediation_text = parse_prose_fields(filepath) - summary["error_text"] = error_text - summary["remediation_text"] = remediation_text - jobs.append(summary) - - if not jobs: - print("No valid job reports found", file=sys.stderr) - sys.exit(1) - - result = build_release_json(release, jobs, timestamp) - output_path = os.path.join(workdir, f"analyze-ci-release-{release}-summary.json") - with open(output_path, "w") as f: - json.dump(result, f, indent=2) - print(f"Written: {output_path}", file=sys.stderr) - print(json.dumps(result, indent=2)) - - elif mode == "prs": - files = find_pr_job_files(workdir) - if not files: - print("No PR job files found", file=sys.stderr) - result = build_pr_json({}, timestamp) - else: - print(f"Found {len(files)} PR job files", file=sys.stderr) - pr_jobs = {} - for filepath in files: - summary = parse_structured_summary(filepath) - if summary is None: - print(f" WARNING: no STRUCTURED SUMMARY in {os.path.basename(filepath)}", file=sys.stderr) - continue - error_text, remediation_text = parse_prose_fields(filepath) - summary["error_text"] = error_text - summary["remediation_text"] = remediation_text - summary["pr_title"] = "" - summary["pr_url"] = "" - - m = re.search(r"-pr(\d+)-", os.path.basename(filepath)) - pr_number = int(m.group(1)) if m else 0 - pr_jobs.setdefault(pr_number, []).append(summary) - - result = build_pr_json(pr_jobs, timestamp) - - output_path = os.path.join(workdir, "analyze-ci-prs-summary.json") - with open(output_path, "w") as f: - json.dump(result, f, indent=2) - print(f"Written: {output_path}", file=sys.stderr) - print(json.dumps(result, indent=2)) - - -if __name__ == "__main__": - main() diff --git a/.claude/scripts/analyze-ci-create-report.py b/.claude/scripts/analyze-ci-create-report.py deleted file mode 100755 index 534bd0580e..0000000000 --- a/.claude/scripts/analyze-ci-create-report.py +++ /dev/null @@ -1,746 +0,0 @@ -#!/usr/bin/env python3 -""" -Generate an HTML report from analyze-ci JSON files. - -Reads JSON summary files (from analyze-ci-aggregate.py) and JSON bug mapping -files (from analyze-ci:create-bugs) to produce a consolidated HTML report. - -Usage: - analyze-ci-create-report.py [--workdir DIR] -""" - -import json -import sys -import os -import re -import html as html_mod -import glob as glob_mod -from datetime import datetime, timezone - - -# --------------------------------------------------------------------------- -# Constants -# --------------------------------------------------------------------------- - -# Threshold for fuzzy matching issue titles to bug candidate signatures. -# Uses asymmetric formula: overlap / len(sig_tokens) — measures what fraction -# of the bug candidate's signature is covered by the issue title. This differs -# from the symmetric min-based formula in aggregate.py/search-bugs.py because -# issue titles are short summaries while signatures are detailed. -MATCH_THRESHOLD = 0.50 - -STOP_WORDS = frozenset({ - "the", "a", "an", "in", "on", "at", "to", "for", "of", "with", "by", - "is", "was", "are", "were", "be", "been", "and", "or", "not", "no", - "but", "from", "that", "this", "all", "has", "have", "had", "do", - "does", "did", "will", "would", "could", "should", "may", "might", -}) - -CSS = """\ - body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; margin: 0; padding: 20px; background: #f5f5f5; color: #333; } - .container { max-width: 1200px; margin: 0 auto; } - h1 { color: #1a1a2e; border-bottom: 3px solid #e94560; padding-bottom: 8px; font-size: 1.4em; margin: 10px 0; } - h2 { font-size: 1.15em; margin: 0; } - h3 { font-size: 1.05em; margin: 0 0 8px 0; } - .release-section { background: white; border-radius: 8px; padding: 15px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1); } - .release-header { display: flex; justify-content: space-between; align-items: center; } - .release-header h2 { color: #16213e; margin: 0; } - .badge { padding: 4px 12px; border-radius: 12px; font-size: 0.85em; font-weight: 600; } - .badge-ok { background: #d4edda; color: #155724; } - .badge-issues { background: #fff3cd; color: #856404; } - .badge-critical { background: #f8d7da; color: #721c24; } - .badge-nodata { background: #e2e3e5; color: #383d41; } - .root-cause { background: #fff8e1; border-left: 3px solid #ffc107; padding: 8px 12px; margin: 8px 0; font-size: 0.9em; } - .status-pass { color: #28a745; } - .status-fail { color: #dc3545; } - .overview-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 10px; margin: 15px 0; } - .overview-card { background: white; border-radius: 8px; padding: 12px; text-align: center; box-shadow: 0 2px 4px rgba(0,0,0,0.1); } - .overview-card .number { font-size: 1.6em; font-weight: 700; } - .overview-card .label { color: #6c757d; font-size: 0.9em; } - .job-date { font-weight: 400; color: #6c757d; font-size: 0.85em; } - .issues-table { width: 100%; border-collapse: collapse; margin: 15px 0; } - .issues-table td { padding: 5px 6px; vertical-align: middle; } - .issues-table .col-num { width: 30px; text-align: right; font-weight: 700; color: #495057; padding-right: 10px; } - .issues-table .col-sev { width: 78px; } - .issues-table .col-ftype { width: 58px; } - .issues-table .col-title { cursor: pointer; user-select: none; } - .issues-table .col-title::before { content: '\\25B6 '; font-size: 0.7em; color: #6c757d; } - .issues-table .col-title.active::before { content: '\\25BC '; } - .issues-table .col-jobs { width: 70px; text-align: center; color: #6c757d; font-size: 0.85em; white-space: nowrap; } - .issues-table .detail-row td { padding: 0 6px 12px 40px; } - .issues-table .detail-row { display: none; } - .issues-table .detail-row.show { display: table-row; } - .issues-table tr.issue-row { border-top: 1px solid #eee; } - .issues-table tr.issue-row:first-child { border-top: none; } - .bug-links { margin: 8px 0; padding: 8px 12px; background: #f0f4ff; border-left: 3px solid #0366d6; font-size: 0.9em; } - .bug-links .bug-tag { display: inline-block; padding: 2px 8px; border-radius: 4px; font-size: 0.85em; font-weight: 600; margin: 2px 4px 2px 0; text-decoration: none; } - .bug-tag-open { background: #fff3cd; color: #856404; border: 1px solid #ffc107; } - .bug-tag-regression { background: #f8d7da; color: #721c24; border: 1px solid #f5c6cb; } - .no-bugs { color: #6c757d; font-style: italic; font-size: 0.85em; } - .toc { background: white; border-radius: 8px; padding: 15px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1); } - .toc ul { list-style: none; padding-left: 0; } - .toc li { padding: 5px 0; } - .toc a { color: #0366d6; text-decoration: none; } - .toc a:hover { text-decoration: underline; } - .timestamp { color: #6c757d; font-size: 0.9em; } - a { color: #0366d6; } - .tab-bar { display: flex; gap: 0; margin: 20px 0 0 0; border-bottom: 2px solid #dee2e6; } - .tab-btn { padding: 12px 24px; border: none; background: transparent; font-size: 1em; font-weight: 600; - color: #6c757d; cursor: pointer; border-bottom: 3px solid transparent; - margin-bottom: -2px; transition: color 0.2s, border-color 0.2s; } - .tab-btn:hover { color: #333; } - .tab-btn.active { color: #e94560; border-bottom-color: #e94560; } - .tab-content { display: none; } - .tab-content.active { display: block; } - .breakdown { display: flex; gap: 15px; margin: 10px 0; flex-wrap: wrap; } - .breakdown-item { font-size: 0.9em; color: #495057; } - .breakdown-item strong { color: #333; } - .severity-badge { display: inline-block; padding: 2px 8px; border-radius: 4px; font-size: 0.75em; font-weight: 700; text-transform: uppercase; } - .severity-high { background: #f8d7da; color: #721c24; } - .severity-medium { background: #fff3cd; color: #856404; } - .severity-low { background: #d4edda; color: #155724; } - .severity-critical { background: #721c24; color: #fff; } - .ftype-badge { display: inline-block; padding: 2px 8px; border-radius: 4px; font-size: 0.75em; font-weight: 700; text-transform: uppercase; } - .ftype-test { background: #cce5ff; color: #004085; } - .ftype-build { background: #e2d5f1; color: #4a235a; } - .ftype-infra { background: #fde2cc; color: #7d4e24; }""" - -JS = """\ -function showTab(e, name) { - document.querySelectorAll('.tab-content').forEach(function(el) { - el.classList.remove('active'); - }); - document.querySelectorAll('.tab-btn').forEach(function(el) { - el.classList.remove('active'); - }); - document.getElementById('tab-' + name).classList.add('active'); - e.target.classList.add('active'); -} -document.querySelectorAll('.col-title').forEach(function(el) { - el.addEventListener('click', function() { - this.classList.toggle('active'); - var row = this.closest('tr').nextElementSibling; - if (row && row.classList.contains('detail-row')) { - row.classList.toggle('show'); - } - }); -});""" - - -# --------------------------------------------------------------------------- -# File discovery -# --------------------------------------------------------------------------- - -def discover_files(workdir, releases): - result = {"releases": {}, "prs": {"summary": None, "status": None, "bugs": []}} - - for version in releases: - entry = {"summary": None, "bugs": None, "jobs": None} - path = os.path.join(workdir, f"analyze-ci-release-{version}-summary.json") - if os.path.exists(path): - entry["summary"] = path - path = os.path.join(workdir, f"analyze-ci-bugs-{version}.json") - if os.path.exists(path): - entry["bugs"] = path - path = os.path.join(workdir, f"analyze-ci-release-{version}-jobs.json") - if os.path.exists(path): - entry["jobs"] = path - result["releases"][version] = entry - - path = os.path.join(workdir, "analyze-ci-prs-summary.json") - if os.path.exists(path): - result["prs"]["summary"] = path - - path = os.path.join(workdir, "analyze-ci-prs-status.json") - if os.path.exists(path): - result["prs"]["status"] = path - - for path in glob_mod.glob(os.path.join(workdir, "analyze-ci-bugs-rebase-release-*.json")): - result["prs"]["bugs"].append(path) - - return result - - -# --------------------------------------------------------------------------- -# JSON loading (replaces all text parsers) -# --------------------------------------------------------------------------- - -def load_json(filepath): - if not filepath or not os.path.exists(filepath): - return None - try: - with open(filepath, "r") as f: - return json.load(f) - except (json.JSONDecodeError, IOError) as exc: - print(f"WARNING: failed to load {filepath}: {exc}", file=sys.stderr) - return None - - -def load_bug_candidates(filepath): - data = load_json(filepath) - if not data: - return [] - return data.get("candidates", []) - - -# --------------------------------------------------------------------------- -# Fuzzy matching -# --------------------------------------------------------------------------- - -def _tokenize(text): - words = re.findall(r"[a-z0-9][a-z0-9_.-]*[a-z0-9]|[a-z0-9]", text.lower()) - return {w for w in words if w not in STOP_WORDS and len(w) >= 2} - - -def match_issue_to_bugs(issue_title, bug_candidates): - if not bug_candidates: - return None - issue_tokens = _tokenize(issue_title) - if not issue_tokens: - return None - best = None - best_score = 0.0 - for cand in bug_candidates: - sig_tokens = _tokenize(cand["error_signature"]) - if not sig_tokens: - continue - score = len(issue_tokens & sig_tokens) / len(sig_tokens) - if score > best_score: - best_score = score - best = cand - return best if best_score >= MATCH_THRESHOLD else None - - -def _extract_pr_numbers(candidate): - """Extract PR numbers from a bug candidate's job names/URLs. - - Handles two patterns: - - File-derived job names: "-pr123-" (from analyze-ci-prs-job-*-pr-*.txt) - - Prow URLs: ".../pull/openshift_microshift/123/..." - """ - pr_nums = set() - for job in candidate.get("jobs", []): - url = job.get("job_url", "") - m = re.search(r"/pull/[^/]+/(\d+)/", url) - if m: - pr_nums.add(int(m.group(1))) - name = job.get("job_name", "") - m = re.search(r"-pr(\d+)-", name) - if m: - pr_nums.add(int(m.group(1))) - return pr_nums - - -def _index_pr_bugs(bug_paths): - """Load PR bug candidates and index them by PR number. - - Returns a dict mapping PR number (int) to list of bug candidates. - Candidates affecting multiple PRs appear under each PR. - """ - by_pr = {} - for path in bug_paths: - for cand in load_bug_candidates(path): - pr_nums = _extract_pr_numbers(cand) - for num in pr_nums: - by_pr.setdefault(num, []).append(cand) - return by_pr - - -# --------------------------------------------------------------------------- -# HTML helpers -# --------------------------------------------------------------------------- - -def _e(text): - return html_mod.escape(str(text)) if text else "" - - -def _badge_class(total_failed, has_critical=False): - if total_failed == 0: - return "badge-ok" - if total_failed >= 5 or has_critical: - return "badge-critical" - return "badge-issues" - - -def _render_bug_links(bug_match): - if not bug_match: - return 'No tracked bugs' - has_dups = bool(bug_match.get("duplicates")) - has_regs = bool(bug_match.get("regressions")) - if not has_dups and not has_regs: - return 'No tracked bugs' - - parts = [] - if has_dups: - parts.append("Bugs:
") - for d in bug_match["duplicates"]: - parts.append( - f'{_e(d["key"])} ' - f'{_e(d["summary"])} ({_e(d["status"])})
' - ) - if has_regs: - parts.append("Regressions:
") - for r in bug_match["regressions"]: - parts.append( - f'{_e(r["key"])} ⟲ ' - f'{_e(r["summary"])} ({_e(r["status"])})
' - ) - return "".join(parts) - - -# --------------------------------------------------------------------------- -# HTML rendering -# --------------------------------------------------------------------------- - -def render_release_section(version, rdata, bug_candidates): - if rdata is None: - return ( - f'
\n' - '
\n' - f'

Release {_e(version)}

\n' - ' no data\n' - '
\n' - "

Analysis failed to produce results.

\n" - "
" - ) - - total = rdata["total_failed"] - has_critical = any(i.get("severity", "").upper() == "CRITICAL" for i in rdata["issues"]) - badge = _badge_class(total, has_critical) - b = rdata["breakdown"] - - lines = [] - lines.append(f'
') - lines.append('
') - lines.append(f"

Release {_e(version)}

") - label = "failure" if total == 1 else "failures" - lines.append(f' {total} {label}') - lines.append("
") - lines.append('
') - lines.append(f' {b["build"]} Build') - lines.append(f' {b["test"]} Test') - lines.append(f' {b["infrastructure"]} Infrastructure') - lines.append("
") - - lines.append(' ') - for issue in rdata["issues"]: - bug_match = match_issue_to_bugs(issue["title"], bug_candidates) - jc = issue["job_count"] - sev = issue.get("severity", "UNKNOWN").upper() - sev_css = f"severity-{sev.lower()}" if sev in ("HIGH", "MEDIUM", "LOW", "CRITICAL") else "" - ftype = issue.get("failure_type", "test") - ftype_label = "INFRA" if ftype == "infrastructure" else ftype.upper() - ftype_css = "ftype-infra" if ftype == "infrastructure" else f"ftype-{ftype}" - jobs_label = f'{jc} {"job" if jc == 1 else "jobs"}' - - lines.append(' ') - lines.append(f' ') - lines.append(f' ') - lines.append(f' ') - lines.append(f' ') - lines.append(f' ') - lines.append(' ') - lines.append(' ") - lines.append('
{issue["number"]}.{sev}{ftype_label}{_e(issue["title"])}{jobs_label}
') - if issue.get("root_cause"): - lines.append(f'
Root Cause: {_e(issue["root_cause"])}
') - lines.append(f' ') - if issue.get("affected_jobs"): - lines.append("

Affected Jobs:

    ") - for job in issue["affected_jobs"]: - if job.get("url"): - lines.append(f'
  • [{_e(job["date"])}] {_e(job["name"])}
  • ') - else: - lines.append(f'
  • [{_e(job["date"])}] {_e(job["name"])}
  • ') - lines.append("
") - if issue.get("next_steps"): - lines.append(f"

Next Steps: {_e(issue['next_steps'])}

") - lines.append("
') - - lines.append("
") - return "\n".join(lines) - - -def render_pr_section(pr_data, all_pr_bugs, pr_status): - """Render the Pull Requests tab. - - pr_data: analyzed PR summary (from aggregate), may be None. - all_pr_bugs: dict mapping PR number (int) to list of bug candidates. - pr_status: list of all PR status snapshots (from prepare), may be None. - """ - # Build a lookup of analyzed PRs by number - analyzed = {} - if pr_data and pr_data.get("has_content"): - for pr in pr_data["prs"]: - analyzed[pr["number"]] = pr - - # Build the full PR list: all PRs from status, merged with analysis - all_prs = [] - if pr_status: - for s in pr_status: - num = s["pr_number"] - entry = { - "number": num, - "title": s.get("title", ""), - "url": s.get("url", ""), - "passed": s.get("passed", 0), - "failed": s.get("failed", 0), - "pending": s.get("pending", 0), - "total": s.get("total", 0), - } - if num in analyzed: - entry["analysis"] = analyzed[num] - all_prs.append(entry) - elif analyzed: - # No status file — fall back to analyzed data only - for pr in pr_data["prs"]: - all_prs.append({ - "number": pr["number"], - "title": pr.get("title", ""), - "url": pr.get("url", ""), - "passed": 0, - "failed": pr.get("failed", 0), - "pending": 0, - "total": pr.get("failed", 0), - "analysis": pr, - }) - - if not all_prs: - return ( - '
\n' - '
\n' - "

Rebase Pull Requests

\n" - ' 0 failures\n' - "
\n" - "

No open rebase pull requests found.

\n" - "
" - ) - - # TOC - toc_lines = [] - toc_lines.append('
') - toc_lines.append('

Table of Contents

') - toc_lines.append('
    ') - for pr in all_prs: - analysis = pr.get("analysis") - if analysis: - b = analysis.get("breakdown", {}) - else: - b = {"build": 0, "test": 0, "infrastructure": 0} - pending = pr.get("pending", 0) - suffix = f' — {pending} running' if pending else '' - toc_lines.append( - f'
  • PR# {pr["number"]}' - f' — {pr["failed"]} failures ({b.get("build", 0)} build, {b.get("test", 0)} test, {b.get("infrastructure", 0)} infra){suffix}
  • ' - ) - toc_lines.append('
') - toc_lines.append('
') - - # Sections - lines = [] - for pr in all_prs: - analysis = pr.get("analysis") - total_failed = pr["failed"] - badge = _badge_class(total_failed) - - lines.append(f'
') - lines.append('
') - pr_link = f'PR# {pr["number"]}' if pr.get("url") else f'PR# {pr["number"]}' - pr_release_m = re.search(r"rebase-(release-\d+\.\d+|main)", pr.get("title", "")) - pr_release_label = f' (rebase {pr_release_m.group(1)})' if pr_release_m else f': {_e(pr["title"])}' if pr.get("title") else '' - lines.append(f'

{pr_link}{pr_release_label}

') - label = "failure" if total_failed == 1 else "failures" - lines.append(f' {total_failed} {label}') - - lines.append("
") - - # Breakdown: same format as periodics (Build/Test/Infrastructure) - # Plus job status (passed/running) when available - pending = pr.get("pending", 0) - if analysis and analysis.get("breakdown"): - b = analysis["breakdown"] - else: - b = {"build": 0, "test": 0, "infrastructure": 0} - lines.append('
') - lines.append(f' {b.get("build", 0)} Build') - lines.append(f' {b.get("test", 0)} Test') - lines.append(f' {b.get("infrastructure", 0)} Infrastructure') - if pr["passed"]: - lines.append(f' {pr["passed"]} Passed') - if pending: - lines.append(f' {pending} Running') - lines.append("
") - - pr_bugs = all_pr_bugs.get(pr["number"], []) - if analysis and analysis.get("issues"): - - lines.append(' ') - for issue in analysis["issues"]: - bug_match = match_issue_to_bugs(issue.get("title", ""), pr_bugs) - jc = issue["job_count"] - sev = issue.get("severity", "UNKNOWN").upper() - sev_css = f"severity-{sev.lower()}" if sev in ("HIGH", "MEDIUM", "LOW", "CRITICAL") else "" - ftype = issue.get("failure_type", "test") - ftype_label = "INFRA" if ftype == "infrastructure" else ftype.upper() - ftype_css = "ftype-infra" if ftype == "infrastructure" else f"ftype-{ftype}" - jobs_label = f'{jc} {"job" if jc == 1 else "jobs"}' - - lines.append(' ') - lines.append(f' ') - lines.append(f' ') - lines.append(f' ') - lines.append(f' ') - lines.append(f' ') - lines.append(' ') - lines.append(' ") - lines.append('
{issue["number"]}.{sev}{ftype_label}{_e(issue["title"])}{jobs_label}
') - if issue.get("root_cause"): - lines.append(f'
Root Cause: {_e(issue["root_cause"])}
') - lines.append(f' ') - if issue.get("affected_jobs"): - lines.append("

Affected Jobs:

    ") - for job in issue["affected_jobs"]: - if job.get("url"): - lines.append(f'
  • [{_e(job["date"])}] {_e(job["name"])}
  • ') - else: - lines.append(f'
  • [{_e(job["date"])}] {_e(job["name"])}
  • ') - lines.append("
") - if issue.get("next_steps"): - lines.append(f"

Next Steps: {_e(issue['next_steps'])}

") - lines.append("
') - - lines.append("
") - return "\n".join(toc_lines) + "\n\n" + "\n".join(lines) - - -def generate_html(releases_data, bug_data, pr_data, all_pr_bugs, pr_status, timestamp): - date_str = timestamp.strftime("%Y-%m-%d") - time_str = timestamp.strftime("%Y-%m-%d %H:%M:%S") - - cards = [] - for version, rdata in releases_data.items(): - count = rdata["total_failed"] if rdata else "?" - css = "status-fail" if rdata and rdata["total_failed"] > 0 else ("status-pass" if rdata else "") - cards.append( - '
\n' - f'
{count}
\n' - f'
Release {_e(version)}
\n' - "
" - ) - # PR overview: count failures from status (all PRs) or analysis - if pr_status: - pr_failed = sum(p.get("failed", 0) for p in pr_status) - elif pr_data: - pr_failed = pr_data.get("total_failed", 0) - else: - pr_failed = 0 - pr_css = "status-fail" if pr_failed > 0 else "status-pass" - cards.append( - '
\n' - f'
{pr_failed}
\n' - f'
Rebase PRs
\n' - "
" - ) - - toc = [] - for version, rdata in releases_data.items(): - if rdata: - b = rdata["breakdown"] - toc.append( - f'
  • Release {_e(version)} — ' - f'{rdata["total_failed"]} failures ({b["build"]} build, {b["test"]} test, {b["infrastructure"]} infra)
  • ' - ) - else: - toc.append(f'
  • Release {_e(version)} — no data
  • ') - - sections = [] - for version, rdata in releases_data.items(): - bugs = bug_data.get(version, []) - sections.append(render_release_section(version, rdata, bugs)) - - pr_section = render_pr_section(pr_data, all_pr_bugs, pr_status) - - return f"""\ - - - - - MicroShift CI Doctor Report - {date_str} - - - -
    -

    MicroShift CI Doctor Report

    -

    Generated: {time_str} UTC

    - -
    -{chr(10).join(cards)} -
    - -
    - - -
    - -
    -
    -

    Table of Contents

    -
      -{chr(10).join(toc)} -
    -
    - -{chr(10).join(sections)} -
    - -
    -{pr_section} -
    - -

     

     

     

     

    -
    - - - -""" - - -# --------------------------------------------------------------------------- -# Main -# --------------------------------------------------------------------------- - -def main(): - workdir = None - releases_arg = None - - args = sys.argv[1:] - i = 0 - while i < len(args): - if args[i] == "--workdir": - if i + 1 >= len(args): - print("Error: --workdir requires an argument", file=sys.stderr) - sys.exit(1) - workdir = args[i + 1] - i += 2 - elif args[i].startswith("-"): - print(f"Unknown option: {args[i]}", file=sys.stderr) - sys.exit(1) - else: - releases_arg = args[i] - i += 1 - - if not releases_arg: - print("Usage: analyze-ci-create-report.py [--workdir DIR] ", file=sys.stderr) - sys.exit(1) - - releases = [v.strip() for v in releases_arg.split(",") if v.strip()] - if not releases: - print("Error: at least one release version is required", file=sys.stderr) - sys.exit(1) - - if workdir is None: - workdir = f"/tmp/analyze-ci-claude-workdir.{datetime.now().strftime('%y%m%d')}" - - if not os.path.isdir(workdir): - print(f"Error: work directory does not exist: {workdir}", file=sys.stderr) - sys.exit(1) - - files = discover_files(workdir, releases) - - # Report discovery - print("Files discovered:") - found_any = False - for version in releases: - entry = files["releases"][version] - parts = [] - if entry["summary"]: - parts.append("summary found") - found_any = True - else: - parts.append("summary MISSING") - parts.append("bug mapping found" if entry["bugs"] else "no bug mapping") - print(f" Release {version}: {', '.join(parts)}") - - pr_entry = files["prs"] - if pr_entry["summary"] or pr_entry["status"]: - found_any = True - parts = [] - if pr_entry["summary"]: - parts.append("summary found") - if pr_entry["status"]: - parts.append("status found") - parts.append(f'{len(pr_entry["bugs"])} bug mapping files') - print(f" PRs: {', '.join(parts)}") - else: - print(" PRs: no data") - - if not found_any: - print(f"\nError: no analysis files found in {workdir}", file=sys.stderr) - sys.exit(1) - - # Load everything via json.load - releases_data = {} - bug_data = {} - _EMPTY_BREAKDOWN = {"build": 0, "test": 0, "infrastructure": 0} - for version in releases: - entry = files["releases"][version] - rdata = load_json(entry["summary"]) - if rdata is None: - # Distinguish "no failures" from "analysis failed" by checking the jobs file - jobs = load_json(entry["jobs"]) - if jobs is not None and len(jobs) == 0: - rdata = { - "total_failed": 0, - "issues": [], - "breakdown": _EMPTY_BREAKDOWN, - } - releases_data[version] = rdata - bug_data[version] = load_bug_candidates(entry["bugs"]) - - pr_data = load_json(pr_entry["summary"]) - pr_status = load_json(pr_entry["status"]) - - all_pr_bugs = _index_pr_bugs(pr_entry["bugs"]) - - # Generate HTML - timestamp = datetime.now(timezone.utc) - html_content = generate_html(releases_data, bug_data, pr_data, all_pr_bugs, pr_status, timestamp) - - output_path = os.path.join(workdir, "microshift-ci-doctor-report.html") - with open(output_path, "w") as f: - f.write(html_content) - - # Summary - print("\nSummary:") - print(" Periodics:") - for version in releases: - rdata = releases_data[version] - if rdata: - print(f" Release {version}: {rdata['total_failed']} failed periodic jobs") - else: - print(f" Release {version}: no data") - print(" Pull Requests:") - if pr_status: - pr_total_failed = sum(p.get("failed", 0) for p in pr_status) - pr_total_pending = sum(p.get("pending", 0) for p in pr_status) - parts = [f"{len(pr_status)} rebase PRs", f"{pr_total_failed} failed jobs"] - if pr_total_pending: - parts.append(f"{pr_total_pending} running") - print(f" {', '.join(parts)}") - elif pr_data and pr_data.get("has_content"): - print(f" {len(pr_data['prs'])} rebase PRs with {pr_data['total_failed']} total failed jobs") - else: - print(" No PR data") - print(f"\nHTML report generated: {output_path}") - - -if __name__ == "__main__": - main() diff --git a/.claude/scripts/analyze-ci-doctor.sh b/.claude/scripts/analyze-ci-doctor.sh deleted file mode 100755 index c0e3643044..0000000000 --- a/.claude/scripts/analyze-ci-doctor.sh +++ /dev/null @@ -1,277 +0,0 @@ -#!/bin/bash -set -euo pipefail - -# Deterministic orchestration for analyze-ci:doctor. -# -# Two phases called by the doctor skill with LLM steps in between: -# -# analyze-ci-doctor.sh prepare --workdir DIR [--rebase] -# - Collects failed jobs for each release and rebase PRs -# - Downloads all artifacts in parallel -# - Writes per-release and PR jobs JSON files -# -# analyze-ci-doctor.sh finalize --workdir DIR -# - Runs analyze-ci-aggregate.py for each release and PRs -# - Runs analyze-ci-create-report.py to generate HTML -# -# Usage from doctor skill: -# 1. analyze-ci-doctor.sh prepare --workdir $WORKDIR 4.18,4.19,4.20,main --rebase -# 2. (LLM launches prow-job agents for all jobs) -# 3. (LLM launches create-bugs agents for Jira search) -# 4. analyze-ci-doctor.sh finalize --workdir $WORKDIR 4.18,4.19,4.20,main - -SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -WORKDIR="" - -# --------------------------------------------------------------------------- -# prepare -# --------------------------------------------------------------------------- - -cmd_prepare() { - local releases_arg="" - local do_rebase=false - - while [[ ${#} -gt 0 ]]; do - case "${1}" in - --workdir) WORKDIR="${2}"; shift 2 ;; - --rebase) do_rebase=true; shift ;; - -*) echo "Unknown option: ${1}" >&2; return 1 ;; - *) releases_arg="${1}"; shift ;; - esac - done - - WORKDIR="${WORKDIR:-/tmp/analyze-ci-claude-workdir.$(date +%y%m%d)}" - - if [[ -z "${releases_arg}" ]]; then - echo "Error: releases argument required" >&2 - echo "Usage: $(basename "$0") prepare [--workdir DIR] [--rebase]" >&2 - return 1 - fi - - mkdir -p "${WORKDIR}" - - IFS=',' read -ra RELEASES <<< "${releases_arg}" - local total_jobs=0 - - # Collect and download for each release - for release in "${RELEASES[@]}"; do - release=$(echo "${release}" | xargs) # trim whitespace - echo "=== Release ${release} ===" >&2 - - local jobs_file="${WORKDIR}/analyze-ci-release-${release}-jobs.json" - - echo " Collecting failed periodic jobs..." >&2 - local raw_json raw_err - raw_err=$(mktemp) - if ! raw_json=$(bash "${SCRIPT_DIR}/microshift-prow-jobs-for-release.sh" "${release}" 2>"${raw_err}"); then - echo " ERROR: failed to collect jobs for release ${release}:" >&2 - cat "${raw_err}" >&2 - rm -f "${raw_err}" - echo "[]" > "${jobs_file}" - continue - fi - rm -f "${raw_err}" - - local filtered_json - filtered_json=$(echo "${raw_json}" | jq '[.[] | select(.type == "periodic")]') - - local count - count=$(echo "${filtered_json}" | jq 'length') - - if [[ "${count}" -eq 0 ]]; then - echo " No failed periodic jobs found" >&2 - echo "[]" > "${jobs_file}" - continue - fi - - echo " Found ${count} failed periodic jobs, downloading artifacts..." >&2 - echo "${filtered_json}" | \ - bash "${SCRIPT_DIR}/analyze-ci-download-jobs.sh" --workdir "${WORKDIR}" 2>/dev/null \ - > "${jobs_file}" - - total_jobs=$((total_jobs + count)) - echo " Done: ${jobs_file}" >&2 - done - - # Collect and download for rebase PRs - if ${do_rebase}; then - echo "=== Rebase Pull Requests ===" >&2 - - local prs_file="${WORKDIR}/analyze-ci-prs-jobs.json" - local prs_status_file="${WORKDIR}/analyze-ci-prs-status.json" - - echo " Collecting rebase PRs..." >&2 - local pr_json pr_err - pr_err=$(mktemp) - if ! pr_json=$(bash "${SCRIPT_DIR}/microshift-prow-jobs-for-pull-requests.sh" \ - --mode detail --author "microshift-rebase-script[bot]" 2>"${pr_err}"); then - echo " ERROR: failed to collect rebase PRs:" >&2 - cat "${pr_err}" >&2 - rm -f "${pr_err}" - echo "[]" > "${prs_file}" - echo "[]" > "${prs_status_file}" - else - rm -f "${pr_err}" - - local pr_count - pr_count=$(echo "${pr_json}" | jq 'length') - - if [[ "${pr_count}" -eq 0 ]]; then - echo " No rebase PRs found" >&2 - echo "[]" > "${prs_file}" - echo "[]" > "${prs_status_file}" - else - # Save job status snapshot for all PRs (used by HTML report) - echo "${pr_json}" | jq '[.[] | { - pr_number, title, url, - passed: [.jobs[] | select(.status == "SUCCESS")] | length, - failed: [.jobs[] | select(.status == "FAILURE")] | length, - pending: [.jobs[] | select(.status != "SUCCESS" and .status != "FAILURE")] | length, - total: (.jobs | length) - }]' > "${prs_status_file}" - echo " Saved status for ${pr_count} rebase PRs" >&2 - - # Filter to PRs with failed jobs for artifact download - local failed_prs - failed_prs=$(echo "${pr_json}" | \ - jq '[.[] | select(.jobs | map(select(.status == "FAILURE")) | length > 0)]') - - local failed_pr_count - failed_pr_count=$(echo "${failed_prs}" | jq 'length') - - if [[ "${failed_pr_count}" -eq 0 ]]; then - echo " No PRs with failures to investigate" >&2 - echo "[]" > "${prs_file}" - else - local job_count - job_count=$(echo "${failed_prs}" | jq '[.[].jobs[] | select(.status == "FAILURE")] | length') - - echo " Downloading artifacts for ${job_count} failed jobs across ${failed_pr_count} PRs..." >&2 - echo "${failed_prs}" | \ - bash "${SCRIPT_DIR}/analyze-ci-download-jobs.sh" --workdir "${WORKDIR}" 2>/dev/null \ - > "${prs_file}" - - total_jobs=$((total_jobs + job_count)) - echo " Done: ${prs_file}" >&2 - fi - fi - fi - fi - - echo "" >&2 - echo "Prepare complete: ${total_jobs} total jobs ready for analysis in ${WORKDIR}" >&2 - - # Output a JSON summary for the LLM to consume - local releases_json="[]" - for release in "${RELEASES[@]}"; do - release=$(echo "${release}" | xargs) - local jobs_file="${WORKDIR}/analyze-ci-release-${release}-jobs.json" - local count=0 - if [[ -f "${jobs_file}" ]]; then - count=$(jq 'length' "${jobs_file}") - fi - releases_json=$(echo "${releases_json}" | jq \ - --arg r "${release}" --argjson c "${count}" --arg f "${jobs_file}" \ - '. + [{release: $r, jobs: $c, jobs_file: $f}]') - done - - local result - result=$(jq -n --arg w "${WORKDIR}" --argjson rel "${releases_json}" \ - '{workdir: $w, releases: $rel}') - - if ${do_rebase}; then - local prs_file="${WORKDIR}/analyze-ci-prs-jobs.json" - local pr_job_count=0 - if [[ -f "${prs_file}" ]]; then - pr_job_count=$(jq 'length' "${prs_file}") - fi - result=$(echo "${result}" | jq \ - --argjson c "${pr_job_count}" --arg f "${prs_file}" \ - '. + {prs: {jobs: $c, jobs_file: $f}}') - fi - - echo "${result}" -} - -# --------------------------------------------------------------------------- -# finalize -# --------------------------------------------------------------------------- - -cmd_finalize() { - local releases_arg="" - - while [[ ${#} -gt 0 ]]; do - case "${1}" in - --workdir) WORKDIR="${2}"; shift 2 ;; - -*) echo "Unknown option: ${1}" >&2; return 1 ;; - *) releases_arg="${1}"; shift ;; - esac - done - - WORKDIR="${WORKDIR:-/tmp/analyze-ci-claude-workdir.$(date +%y%m%d)}" - - if [[ -z "${releases_arg}" ]]; then - echo "Error: releases argument required" >&2 - echo "Usage: $(basename "$0") finalize [--workdir DIR] " >&2 - return 1 - fi - - IFS=',' read -ra RELEASES <<< "${releases_arg}" - - # Aggregate each release - for release in "${RELEASES[@]}"; do - release=$(echo "${release}" | xargs) - echo "=== Aggregating release ${release} ===" >&2 - python3 "${SCRIPT_DIR}/analyze-ci-aggregate.py" \ - --release "${release}" --workdir "${WORKDIR}" >/dev/null 2>&1 || \ - echo " WARNING: aggregation failed for ${release}" >&2 - done - - # Aggregate PRs (if job files exist) - local pr_files - pr_files=$(find "${WORKDIR}" -name 'analyze-ci-prs-job-*.txt' 2>/dev/null | head -1) - if [[ -n "${pr_files}" ]]; then - echo "=== Aggregating PRs ===" >&2 - python3 "${SCRIPT_DIR}/analyze-ci-aggregate.py" \ - --prs --workdir "${WORKDIR}" >/dev/null 2>&1 || \ - echo " WARNING: PR aggregation failed" >&2 - fi - - # Generate HTML report - echo "=== Generating HTML report ===" >&2 - python3 "${SCRIPT_DIR}/analyze-ci-create-report.py" \ - --workdir "${WORKDIR}" "${releases_arg}" -} - -# --------------------------------------------------------------------------- -# main -# --------------------------------------------------------------------------- - -usage() { - echo "Usage: $(basename "$0") [--workdir DIR] [options]" >&2 - echo "" >&2 - echo "Commands:" >&2 - echo " prepare [--workdir DIR] [--rebase] Collect jobs and download artifacts" >&2 - echo " finalize [--workdir DIR] Aggregate results and generate HTML" >&2 - echo "" >&2 - echo " : comma-separated release versions (e.g., 4.18,4.19,4.20,main)" >&2 - echo " --workdir DIR: work directory (default: /tmp/analyze-ci-claude-workdir.YYMMDD)" >&2 - exit 1 -} - -main() { - if [[ ${#} -lt 1 ]]; then - usage - fi - - local cmd="${1}" - shift - - case "${cmd}" in - prepare) cmd_prepare "${@}" ;; - finalize) cmd_finalize "${@}" ;; - *) echo "Unknown command: ${cmd}" >&2; usage ;; - esac -} - -main "${@}" diff --git a/.claude/scripts/analyze-ci-download-jobs.sh b/.claude/scripts/analyze-ci-download-jobs.sh deleted file mode 100755 index b85c5f272f..0000000000 --- a/.claude/scripts/analyze-ci-download-jobs.sh +++ /dev/null @@ -1,162 +0,0 @@ -#!/bin/bash -set -euo pipefail - -# Download Prow job artifacts for analysis. -# -# Accepts JSON on stdin — either flat job array (from microshift-prow-jobs-for-release.sh) -# or nested PR array (from microshift-prow-jobs-for-pull-requests.sh --mode detail). -# Downloads artifacts into WORKDIR/artifacts/BUILD_ID/ with parallel workers. -# Skips already-downloaded jobs. Outputs JSON job list with local paths on stdout. -# -# Usage: -# microshift-prow-jobs-for-release.sh 4.22 | analyze-ci-download-jobs.sh --workdir DIR -# microshift-prow-jobs-for-release.sh 4.22 | analyze-ci-download-jobs.sh --workdir DIR --parallel 4 -# microshift-prow-jobs-for-pull-requests.sh --mode detail | analyze-ci-download-jobs.sh --workdir DIR -# -# Output (stdout): JSON array of job objects with "artifacts_dir" added: -# [{"job":"...","url":"...","build_id":"...","artifacts_dir":"/tmp/.../artifacts/BUILD_ID"}, ...] -# -# Progress/errors: stderr - -WORKDIR="" - -# Convert a Prow view URL to a GCS path -url_to_gcs() { - echo "$1" | sed \ - -e 's|https://prow.ci.openshift.org/view/gs/|gs://|' \ - -e 's|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/|gs://|' -} - -# Download a single job's artifacts -# Args: build_id url -# Returns: 0 on success (cached or downloaded), 1 on failure -# -# gcloud storage cp -r gs://bucket/.../BUILD_ID/ dest/ creates dest/BUILD_ID/... -# so the final layout is: ${WORKDIR}/artifacts/${BUILD_ID}/finished.json etc. -download_job() { - local build_id="$1" - local url="$2" - local dest="${WORKDIR}/artifacts/${build_id}" - - if [[ -d "${dest}" ]] && [[ -f "${dest}/finished.json" ]]; then - echo " cached: ${build_id}" >&2 - return 0 - fi - - local gcs_path - gcs_path=$(url_to_gcs "${url}") - - # gcloud cp -r .../BUILD_ID/ parent/ → parent/BUILD_ID/... - # so we download into the parent and let gcloud create the BUILD_ID dir - local parent="${WORKDIR}/artifacts" - mkdir -p "${parent}" - if gcloud storage cp -r "${gcs_path}/" "${parent}/" >/dev/null 2>&1; then - echo " downloaded: ${build_id}" >&2 - return 0 - else - echo " FAILED: ${build_id}" >&2 - return 1 - fi -} - -usage() { - echo "Usage: | ${0} --workdir DIR [--parallel N]" >&2 - echo " --workdir DIR: work directory (required)" >&2 - echo " --parallel N: number of parallel downloads (default: 6)" >&2 - echo "" >&2 - echo "Accepts JSON on stdin from:" >&2 - echo " microshift-prow-jobs-for-release.sh (flat job array)" >&2 - echo " microshift-prow-jobs-for-pull-requests.sh --mode detail (nested PR array)" >&2 - exit 1 -} - -main() { - local parallel=6 - - while [[ ${#} -gt 0 ]]; do - case "${1}" in - --workdir) - [[ ${#} -lt 2 ]] && { echo "Error: --workdir requires a directory" >&2; usage; } - WORKDIR="${2}"; shift 2 ;; - --parallel) - [[ ${#} -lt 2 ]] && { echo "Error: --parallel requires a number" >&2; usage; } - parallel="${2}"; shift 2 ;; - -h|--help) usage ;; - -*) echo "Unknown option: ${1}" >&2; usage ;; - *) echo "Unknown argument: ${1}" >&2; usage ;; - esac - done - - if [[ -z "${WORKDIR}" ]]; then - echo "Error: --workdir is required" >&2 - usage - fi - - mkdir -p "${WORKDIR}/artifacts" - - # Read all stdin into a variable - local input - input=$(cat) - - # Normalize input: detect format and extract flat job list - # Release format: [{"job":...,"url":...,"build_id":...}, ...] - # PR format: [{"pr_number":...,"jobs":[{"job":...,"url":...,"build_id":...}, ...]}, ...] - local jobs_json - if echo "${input}" | jq -e '.[0].jobs' >/dev/null 2>&1; then - # PR format — flatten nested jobs, carry pr_number into each job - jobs_json=$(echo "${input}" | jq '[.[] | .pr_number as $pr | .jobs[] | . + {pr_number: $pr}]') - else - # Release format — use as-is - jobs_json="${input}" - fi - - local total - total=$(echo "${jobs_json}" | jq 'length') - - if [[ "${total}" -eq 0 ]]; then - echo "No jobs to download." >&2 - echo "[]" - return 0 - fi - - echo "Downloading artifacts for ${total} jobs (${parallel} parallel)..." >&2 - - # Export functions and vars for subshells - export WORKDIR - export -f download_job url_to_gcs - - # Download all jobs in parallel - local status_file - status_file=$(mktemp) - - while IFS=$'\t' read -r build_id url; do - ( - if download_job "${build_id}" "${url}"; then - echo "${build_id}:ok" >> "${status_file}" - else - echo "${build_id}:fail" >> "${status_file}" - fi - ) & - - # Limit parallelism - while [[ $(jobs -rp | wc -l) -ge ${parallel} ]]; do - wait -n 2>/dev/null || true - done - done < <(echo "${jobs_json}" | jq -r '.[] | [.build_id, .url] | @tsv') - wait - - # Count results - local ok=0 fail=0 - if [[ -f "${status_file}" ]]; then - ok=$(grep -c ':ok$' "${status_file}" 2>/dev/null || true) - fail=$(grep -c ':fail$' "${status_file}" 2>/dev/null || true) - fi - rm -f "${status_file}" - - echo "Done: ${ok} downloaded/cached, ${fail} failed." >&2 - - # Output enriched JSON with artifacts_dir added - echo "${jobs_json}" | jq --arg workdir "${WORKDIR}" '[.[] | . + {artifacts_dir: ($workdir + "/artifacts/" + .build_id)}]' -} - -main "${@}" diff --git a/.claude/scripts/analyze-ci-search-bugs.py b/.claude/scripts/analyze-ci-search-bugs.py deleted file mode 100755 index 962ea150f3..0000000000 --- a/.claude/scripts/analyze-ci-search-bugs.py +++ /dev/null @@ -1,387 +0,0 @@ -#!/usr/bin/env python3 -""" -Prepare bug candidates from per-job analysis reports. - -Parses STRUCTURED SUMMARY blocks, groups by ERROR_SIGNATURE similarity, -extracts Jira search keywords, and writes a candidates JSON file for -the create-bugs skill to search Jira against. - -Usage: - analyze-ci-search-bugs.py [--workdir DIR] - - is one of: - - Release version: 4.22, main - - PR number: pr-6396, pr6396 - - Rebase shorthand: rebase-release-4.22 - -Output: - ${WORKDIR}/analyze-ci-bug-candidates-.json -""" - -import json -import sys -import os -import re -import glob as glob_mod -from datetime import datetime, timezone - - -# --------------------------------------------------------------------------- -# Constants -# --------------------------------------------------------------------------- - -STOP_WORDS = frozenset({ - "the", "a", "an", "in", "on", "at", "to", "for", "of", "with", "by", - "is", "was", "are", "were", "be", "been", "and", "or", "not", "no", - "but", "from", "that", "this", "all", "has", "have", "had", "do", - "does", "did", "will", "would", "could", "should", "may", "might", -}) - -# Additional stop words filtered only during keyword extraction for Jira search, -# not during signature grouping (which must match aggregate.py's tokenization). -KEYWORD_STOP_WORDS = STOP_WORDS | frozenset({ - "ci", "microshift", "failure", "failed", "error", "test", "tests", - "job", "jobs", "step", "periodic", -}) - -SIMILARITY_THRESHOLD = 0.50 - - -# --------------------------------------------------------------------------- -# Parsing -# --------------------------------------------------------------------------- - -def parse_structured_summary(filepath): - """Extract STRUCTURED SUMMARY block from a per-job report file.""" - with open(filepath, "r") as f: - content = f.read() - - m = re.search( - r"--- STRUCTURED SUMMARY ---\n(.+?)\n--- END STRUCTURED SUMMARY ---", - content, re.DOTALL, - ) - if not m: - return None - - data = {} - for line in m.group(1).strip().split("\n"): - if ":" in line: - key, val = line.split(":", 1) - data[key.strip()] = val.strip() - - try: - severity = int(data.get("SEVERITY", "3")) - except ValueError: - severity = 3 - - # Get the analysis text (everything before STRUCTURED SUMMARY) - analysis_text = content.split("--- STRUCTURED SUMMARY ---")[0].strip() - - return { - "severity": severity, - "stack_layer": data.get("STACK_LAYER", ""), - "step_name": data.get("STEP_NAME", ""), - "error_signature": data.get("ERROR_SIGNATURE", ""), - "raw_error": data.get("RAW_ERROR", ""), - "infrastructure_failure": data.get("INFRASTRUCTURE_FAILURE", "false").lower() == "true", - "job_url": data.get("JOB_URL", ""), - "job_name": data.get("JOB_NAME", ""), - "release": data.get("RELEASE", ""), - "finished": data.get("FINISHED", ""), - "analysis_text": analysis_text, - "source_file": filepath, - } - - -# --------------------------------------------------------------------------- -# Grouping -# --------------------------------------------------------------------------- - -def _normalize_step_name(step_name): - """Extract the step ref from a fully-qualified Prow step name. - - Prow step names follow ``-`` where the - step ref typically starts with ``openshift-microshift-``. - """ - m = re.search(r"(openshift-microshift-\S+)", step_name) - return m.group(1) if m else step_name - - -def _tokenize(text): - words = re.findall(r"[a-z0-9][a-z0-9_.-]*[a-z0-9]|[a-z0-9]", text.lower()) - return {w for w in words if w not in STOP_WORDS and len(w) >= 2} - - -def _signature_similarity(sig_a, sig_b): - tokens_a = _tokenize(sig_a) - tokens_b = _tokenize(sig_b) - if not tokens_a or not tokens_b: - return 0.0 - return len(tokens_a & tokens_b) / min(len(tokens_a), len(tokens_b)) - - -def _grouping_text(job): - """Return the text used for similarity grouping. - - Prefers RAW_ERROR (verbatim log text, deterministic) over - ERROR_SIGNATURE (LLM-paraphrased, variable across runs). - """ - return job.get("raw_error") or job.get("error_signature", "") - - -def _group_by_similarity(jobs): - """Group jobs by similarity of their grouping text. - - Uses RAW_ERROR when available (deterministic log text), - falling back to ERROR_SIGNATURE for older reports. - - A new job is compared against ALL existing members of each group, - not just the first. If any member exceeds the similarity threshold - the job joins that group. - """ - groups = [] - for job in jobs: - sig = _grouping_text(job) - placed = False - for group in groups: - if any( - _signature_similarity(sig, _grouping_text(member)) >= SIMILARITY_THRESHOLD - for member in group - ): - group.append(job) - placed = True - break - if not placed: - groups.append([job]) - return groups - - -def group_by_signature(jobs): - """Two-pass grouping: first by step_name, then by signature similarity. - - Grouping by step_name first prevents jobs from different CI steps - from being merged together even when their error signatures share - enough tokens to exceed the similarity threshold. - """ - # Pass 1: bucket by normalized step_name - by_step = {} - for job in jobs: - step = _normalize_step_name(job.get("step_name", "")) - by_step.setdefault(step, []).append(job) - - # Pass 2: within each step bucket, group by signature similarity - all_groups = [] - for step_jobs in by_step.values(): - all_groups.extend(_group_by_similarity(step_jobs)) - return all_groups - - -# --------------------------------------------------------------------------- -# Keyword extraction -# --------------------------------------------------------------------------- - -def _tokenize_for_keywords(text): - """Tokenize with extra stop words filtered for Jira keyword extraction.""" - words = re.findall(r"[a-z0-9][a-z0-9_.-]*[a-z0-9]|[a-z0-9]", text.lower()) - return {w for w in words if w not in KEYWORD_STOP_WORDS and len(w) >= 2} - - -def extract_keywords(error_signature): - """Extract distinctive search keywords from an error signature. - - Returns a list of 2-4 keywords ranked by specificity. - Uses KEYWORD_STOP_WORDS (broader filtering) so generic CI terms - like "test", "failed", "microshift" don't pollute Jira searches. - """ - tokens = _tokenize_for_keywords(error_signature) - if not tokens: - return [] - - def specificity(token): - score = len(token) - if "-" in token or "." in token: - score += 10 - if any(c.isdigit() for c in token): - score += 5 - return score - - ranked = sorted(tokens, key=lambda t: (-specificity(t), t)) - return ranked[:4] - - -def extract_test_ids(error_signature): - """Extract numeric test case IDs (4-6 digits) from error signature.""" - return re.findall(r"\b(\d{4,6})\b", error_signature) - - -# --------------------------------------------------------------------------- -# Candidate building -# --------------------------------------------------------------------------- - -def build_candidates(groups): - """Build bug candidate list from grouped jobs.""" - candidates = [] - - for group in groups: - rep = max(group, key=lambda j: (j["severity"], j.get("job_name", ""))) - keywords = extract_keywords(rep["error_signature"]) - test_ids = extract_test_ids(rep["error_signature"]) - - step_names = sorted({j["step_name"] for j in group if j["step_name"]}) - - candidates.append({ - "error_signature": rep["error_signature"], - "severity": max(j["severity"] for j in group), - "step_name": ", ".join(step_names), - "affected_jobs": len(group), - "keywords": keywords, - "test_ids": test_ids, - "jobs": [ - { - "job_name": j["job_name"], - "job_url": j["job_url"], - "finished": j["finished"], - } - for j in group - ], - "analysis_text": rep["analysis_text"], - }) - - # Sort by severity desc, then job count desc - candidates.sort(key=lambda c: (-c["severity"], -c["affected_jobs"], c["error_signature"])) - return candidates - - -# --------------------------------------------------------------------------- -# File discovery -# --------------------------------------------------------------------------- - -def find_job_files(workdir, source): - """Find per-job report files for a given source. - - Returns (files, source_label) tuple. - """ - # Release version - if re.match(r"^(\d+\.\d+|main)$", source): - pattern = os.path.join(workdir, f"analyze-ci-release-{source}-job-*.txt") - files = sorted(glob_mod.glob(pattern)) - return files, f"release {source}" - - # PR number - m = re.match(r"^pr-?(\d+)$", source) - if m: - pr_num = m.group(1) - pattern = os.path.join(workdir, f"analyze-ci-prs-job-*-pr{pr_num}-*.txt") - files = sorted(glob_mod.glob(pattern)) - return files, f"PR #{pr_num}" - - # Rebase PR shorthand - m = re.match(r"^rebase-release-(.+)$", source) - if m: - release = m.group(1) - # Scan all PR job files for ones matching this release - pattern = os.path.join(workdir, "analyze-ci-prs-job-*.txt") - all_files = sorted(glob_mod.glob(pattern)) - files = [] - for filepath in all_files: - summary = parse_structured_summary(filepath) - if summary and ( - f"release-{release}" in summary.get("job_name", "") - or summary.get("release", "") == release - ): - files.append(filepath) - return files, f"rebase PR for {release}" - - return [], source - - -# --------------------------------------------------------------------------- -# Main -# --------------------------------------------------------------------------- - -def main(): - workdir = None - source = None - - args = sys.argv[1:] - i = 0 - while i < len(args): - if args[i] == "--workdir": - if i + 1 >= len(args): - print("Error: --workdir requires an argument", file=sys.stderr) - sys.exit(1) - workdir = args[i + 1] - i += 2 - elif args[i].startswith("-"): - print(f"Unknown option: {args[i]}", file=sys.stderr) - sys.exit(1) - else: - source = args[i] - i += 1 - - if not source: - print( - "Usage: analyze-ci-search-bugs.py [--workdir DIR]\n" - " : release version (4.22), PR (pr-6396), or rebase (rebase-release-4.22)", - file=sys.stderr, - ) - sys.exit(1) - - if workdir is None: - workdir = f"/tmp/analyze-ci-claude-workdir.{datetime.now().strftime('%y%m%d')}" - - if not os.path.isdir(workdir): - print(f"Error: work directory does not exist: {workdir}", file=sys.stderr) - sys.exit(1) - - files, source_label = find_job_files(workdir, source) - if not files: - print(f"No job files found for {source_label} in {workdir}", file=sys.stderr) - sys.exit(1) - - print(f"Found {len(files)} job files for {source_label}", file=sys.stderr) - - # Parse all files - jobs = [] - skipped = 0 - for filepath in files: - summary = parse_structured_summary(filepath) - if summary is None: - print(f" WARNING: no STRUCTURED SUMMARY in {os.path.basename(filepath)}", file=sys.stderr) - skipped += 1 - continue - jobs.append(summary) - - if not jobs: - print("No valid job reports found", file=sys.stderr) - sys.exit(1) - - print(f"Parsed {len(jobs)} jobs ({skipped} skipped)", file=sys.stderr) - - # Group and build candidates - groups = group_by_signature(jobs) - candidates = build_candidates(groups) - - print(f"Deduplicated to {len(candidates)} bug candidates", file=sys.stderr) - - # Build output - result = { - "source": source, - "source_label": source_label, - "date": datetime.now(timezone.utc).strftime("%Y-%m-%d"), - "job_files_found": len(files), - "job_files_parsed": len(jobs), - "job_files_skipped": skipped, - "candidates": candidates, - } - - output_path = os.path.join(workdir, f"analyze-ci-bug-candidates-{source}.json") - with open(output_path, "w") as f: - json.dump(result, f, indent=2) - - print(f"Written: {output_path}", file=sys.stderr) - print(json.dumps(result, indent=2)) - - -if __name__ == "__main__": - main() diff --git a/.claude/scripts/extract_microshift_version.py b/.claude/scripts/extract_microshift_version.py deleted file mode 100644 index f78a361b13..0000000000 --- a/.claude/scripts/extract_microshift_version.py +++ /dev/null @@ -1,302 +0,0 @@ -#!/usr/bin/env python3 -""" -Extract MicroShift version from Prow CI journal logs. - -This script fetches the journal log from a Prow CI job and extracts the -exact MicroShift version being tested from the systemd journal output. - -Usage: - python3 extract_microshift_version.py - -Arguments: - prow_url: The Prow CI job URL - e.g.: "https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_microshift/5703/pull-ci-openshift-microshift-main-e2e-aws-tests-bootc-release/1990417342856695808" - scenario: Scenario name - e.g.: "el96-lrel@backups" - -Output: - JSON object with: - { - "version": "4.20.0-202511160032.p0.gb46fe41.assembly.stream.el9", - "build_type": "nightly", - "success": true, - "error": null - } -""" - -import sys -import json -import re -import urllib.request -import urllib.error -import urllib.parse -import ssl -from html.parser import HTMLParser - - -class GCSWebLinkParser(HTMLParser): - """Parse GCSWeb HTML to extract file links.""" - - def __init__(self): - super().__init__() - self.links = [] - - def handle_starttag(self, tag, attrs): - if tag == 'a': - for attr, value in attrs: - if attr == 'href' and value and not value.startswith('?'): - self.links.append(value) - - -def construct_journal_log_dir_url(job_id, version, job_type, pr_number=None, scenario="el96-lrel@ipv6"): - """Construct the URL to the journal log directory for a given job.""" - base_url = "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results" - - if pr_number: - # PR job URL pattern - job_name = f"pull-ci-openshift-microshift-release-{version}-{job_type}" - path = f"pr-logs/pull/openshift_microshift/{pr_number}/{job_name}/{job_id}" - else: - # Periodic job URL pattern - job_name = f"periodic-ci-openshift-microshift-release-{version}-periodics-{job_type}" - path = f"logs/{job_name}/{job_id}" - - artifact_path = f"artifacts/{job_type}/openshift-microshift-e2e-metal-tests/artifacts/scenario-info/{scenario}/vms/host1/sos" - - return f"{base_url}/{path}/{artifact_path}/" - - -def fetch_url(url): - """Fetch content from the given URL.""" - try: - # Validate URL scheme - if not url.startswith('https://'): - return None, "Invalid URL scheme. Only HTTPS URLs are supported." - - # Create SSL context that doesn't verify certificates - ssl_context = ssl.create_default_context() - ssl_context.check_hostname = False - ssl_context.verify_mode = ssl.CERT_NONE - - with urllib.request.urlopen(url, timeout=30, context=ssl_context) as response: - content = response.read().decode('utf-8') - return content, None - except urllib.error.URLError as e: - return None, f"Failed to fetch URL: {e}" - except (UnicodeDecodeError, OSError) as e: - return None, f"Error reading URL: {e}" - - -def find_journal_log_file(dir_url): - """Find the journal log file in the directory listing.""" - html_content, error = fetch_url(dir_url) - if error: - return None, error - - # Parse HTML to find links - parser = GCSWebLinkParser() - parser.feed(html_content) - - # Find journal_*.log files - journal_files = [] - for link in parser.links: - # Extract just the filename from the full path - filename = link.split('/')[-1] - # URL-decode the filename - decoded_filename = urllib.parse.unquote(filename) - if decoded_filename.startswith('journal_') and decoded_filename.endswith('.log'): - # Keep the URL-encoded filename for URL construction - journal_files.append(filename) - - if not journal_files: - return None, "No journal log files found in directory" - - # Return the first journal log file (there should typically be only one) - return journal_files[0], None - - -def extract_version_from_journal(log_content): - """ - Extract MicroShift version from journal log content. - - Looks for pattern: "Version" microshift="4.20.0-202511160032.p0.gb46fe41.assembly.stream.el9" - """ - # Pattern to match: "Version" microshift="" - pattern = r'"Version"\s+microshift="([^"]+)"' - - matches = re.findall(pattern, log_content) - if matches: - version_string = matches[-1].strip() - return version_string, None - - return None, "Could not find MicroShift version in journal log" - - -def determine_build_type(version_string): - """ - Determine the build type from the version string. - - Returns one of: "nightly", "ec", "rc", "zstream" - """ - if "nightly" in version_string.lower(): - return "nightly" - elif "-ec." in version_string: - return "ec" - elif "-rc." in version_string: - return "rc" - else: - # Check if it's a date-based build (likely nightly/stream) - if re.search(r'\d{12}\.p\d+\.g[a-f0-9]+', version_string): - return "nightly" - return "zstream" - - -def parse_prow_url(prow_url): - """ - Parse a Prow CI URL to extract job information. - - Supported URL formats: - - PR jobs: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_microshift/{pr_number}/{job_name}/{job_id} - - Periodic jobs: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/{job_name}/{job_id} - - Returns: - Tuple of (job_id, version, job_type, pr_number) or (None, None, None, None) on error - """ - # Remove the prow.ci.openshift.org prefix and normalize - url_parts = prow_url.replace("https://prow.ci.openshift.org/view/gs/test-platform-results/", "") - - # Split by '/' - parts = url_parts.split('/') - - if len(parts) < 2: - return None, None, None, None, "Invalid URL format" - - # Check if it's a PR job or periodic job - if parts[0] == "pr-logs" and len(parts) >= 6: - # PR job format: pr-logs/pull/openshift_microshift/{pr_number}/{job_name}/{job_id} - pr_number = parts[3] - job_name = parts[4] - job_id = parts[5] - - # Extract version and job_type from job_name - # Format: pull-ci-openshift-microshift-release-{version}-{job_type} - # or: pull-ci-openshift-microshift-release-{version}-{job_type}-{arch} - if job_name.startswith("pull-ci-openshift-microshift-release-"): - name_parts = job_name.replace("pull-ci-openshift-microshift-release-", "").split("-", 1) - if len(name_parts) >= 2: - version = name_parts[0] - job_type = name_parts[1] - return job_id, version, job_type, pr_number, None - - return None, None, None, None, f"Could not parse PR job name: {job_name}" - - elif parts[0] == "logs" and len(parts) >= 3: - # Periodic job format: logs/{job_name}/{job_id} - job_name = parts[1] - job_id = parts[2] - - # Extract version and job_type from job_name - # Format: periodic-ci-openshift-microshift-release-{version}-periodics-{job_type} - if job_name.startswith("periodic-ci-openshift-microshift-release-"): - name_parts = job_name.replace("periodic-ci-openshift-microshift-release-", "") - # Split on "-periodics-" to separate version from job_type - if "-periodics-" in name_parts: - version, job_type = name_parts.split("-periodics-", 1) - return job_id, version, job_type, None, None - - return None, None, None, None, f"Could not parse periodic job name: {job_name}" - - return None, None, None, None, "Unsupported URL format" - - -def main(): - """Main entry point.""" - if len(sys.argv) != 3: - print(json.dumps({ - "success": False, - "error": "Usage: extract_microshift_version.py " - })) - sys.exit(1) - - prow_url = sys.argv[1] - scenario = sys.argv[2] - - # Validate inputs - if not prow_url or not prow_url.strip(): - print(json.dumps({ - "success": False, - "error": "prow_url cannot be empty" - })) - sys.exit(1) - - if not scenario or not scenario.strip(): - print(json.dumps({ - "success": False, - "error": "scenario cannot be empty" - })) - sys.exit(1) - - # Parse the Prow URL - job_id, version, job_type, pr_number, parse_error = parse_prow_url(prow_url) - if parse_error: - print(json.dumps({ - "success": False, - "error": parse_error, - "prow_url": prow_url - })) - sys.exit(1) - - # Construct journal log directory URL - dir_url = construct_journal_log_dir_url(job_id, version, job_type, pr_number, scenario) - - # Find journal log file - journal_file, error = find_journal_log_file(dir_url) - if error: - print(json.dumps({ - "success": False, - "error": error, - "url": dir_url - })) - sys.exit(1) - - # Construct full journal log URL - log_url = dir_url + journal_file - - # Fetch journal log - log_content, error = fetch_url(log_url) - if error: - print(json.dumps({ - "success": False, - "error": error, - "url": log_url - })) - sys.exit(1) - - # Extract version - microshift_version, error = extract_version_from_journal(log_content) - if error: - print(json.dumps({ - "success": False, - "error": error, - "url": log_url - })) - sys.exit(1) - - # Determine build type - build_type = determine_build_type(microshift_version) - - # Output result - result = { - "success": True, - "version": microshift_version, - "build_type": build_type, - "url": log_url, - "error": None - } - - print(json.dumps(result, indent=2)) - sys.exit(0) - - -if __name__ == "__main__": - main() diff --git a/.claude/scripts/microshift-prow-jobs-for-pull-requests.sh b/.claude/scripts/microshift-prow-jobs-for-pull-requests.sh deleted file mode 100755 index 3459cf9cd0..0000000000 --- a/.claude/scripts/microshift-prow-jobs-for-pull-requests.sh +++ /dev/null @@ -1,341 +0,0 @@ -#!/bin/bash -set -euo pipefail - -# Prow Jobs for Pull Requests -# Data modes (summary, detail): JSON on stdout -# Action modes (approve, restart): text on stdout -# Progress/errors: stderr - -GCS_API="https://storage.googleapis.com/storage/v1/b/test-platform-results/o" -GCS_BASE="https://storage.googleapis.com/test-platform-results" -PROW_VIEW="https://prow.ci.openshift.org/view/gs/test-platform-results" -GH_REPO="openshift/microshift" -GCS_PR_PREFIX="pr-logs/pull/openshift_microshift" -SIGNATURE=$'\n'"*Added by $(basename "${0}")* :robot:"$'\n' - -# Get open PRs as JSON array -fetch_open_prs() { - local filter="${1:-}" - local author="${2:-}" - local -a gh_args=(--repo "${GH_REPO}" --state open --limit 100 --json "number,title,url") - - [[ -n "${author}" ]] && gh_args+=(--author "${author}") - - local pr_data - pr_data=$(gh pr list "${gh_args[@]}") - - if [[ -n "${filter}" ]]; then - echo "${pr_data}" | jq -c --arg f "${filter}" '[.[] | select(.title | contains($f))]' - else - echo "${pr_data}" - fi -} - -# List job names for a PR from GCS -list_pr_jobs() { - local pr="${1}" - curl -s --max-time 30 "${GCS_API}?prefix=${GCS_PR_PREFIX}/${pr}/&delimiter=/" | \ - jq -r '.prefixes[]? // empty' | \ - sed "s|${GCS_PR_PREFIX}/${pr}/||; s|/$||" -} - -# Get latest build result as a JSON object -# Returns PENDING status for jobs still running (no finished.json). -get_latest_build() { - local pr="${1}" job="${2}" - local build_id finished_json - - build_id=$(curl -s --max-time 10 "${GCS_BASE}/${GCS_PR_PREFIX}/${pr}/${job}/latest-build.txt" 2>/dev/null) || return 1 - [[ -z "${build_id}" || "${build_id}" == *"<"* ]] && return 1 - - local url="${PROW_VIEW}/pr-logs/pull/openshift_microshift/${pr}/${job}/${build_id}" - - finished_json=$(curl -s --max-time 10 "${GCS_BASE}/${GCS_PR_PREFIX}/${pr}/${job}/${build_id}/finished.json" 2>/dev/null) - - if [[ -z "${finished_json}" || "${finished_json}" == *"NoSuchKey"* || "${finished_json}" == *"<"* ]]; then - # Job still running — no finished.json yet - jq -nc --arg job "${job}" --arg url "${url}" --arg build_id "${build_id}" \ - '{job: $job, status: "PENDING", url: $url, build_id: $build_id, finished: null}' - return 0 - fi - - echo "${finished_json}" | jq -c \ - --arg job "${job}" --arg url "${url}" --arg build_id "${build_id}" \ - '{ - job: $job, - status: (.result // "PENDING"), - url: $url, - build_id: $build_id, - finished: (if (.timestamp // 0) > 0 then .timestamp | todate else null end) - }' -} - -# Fetch job results for a single PR into temp dir (parallelized) -# Skips individual jobs that fail to fetch (e.g. no latest-build.txt). -fetch_pr_results() { - local pr="${1}" - local tmpdir="${2}" - local jobs - - jobs=$(list_pr_jobs "${pr}") || return 1 - [[ -z "${jobs}" ]] && return 0 - - while IFS= read -r job; do - ( - result=$(get_latest_build "${pr}" "${job}" 2>/dev/null) || exit 0 - if [[ -n "${result}" ]]; then - echo "${result}" > "${tmpdir}/${job}.json" - fi - ) & - done <<< "${jobs}" - wait - return 0 -} - -# Collect per-job JSON files into a single JSON array -collect_jobs_json() { - local tmpdir="${1}" - local files=("${tmpdir}"/*.json) - if [[ ! -f "${files[0]}" ]]; then - echo "[]" - return - fi - cat "${files[@]}" | jq -s '.' -} - -# Summary mode: JSON array of PRs with pass/fail counts -mode_summary() { - local filter="${1:-}" author="${2:-}" - local pr_data output_tmp - - echo "Fetching open PRs..." >&2 - pr_data=$(fetch_open_prs "${filter}" "${author}") - [[ "$(echo "${pr_data}" | jq 'length')" -eq 0 ]] && { echo "[]"; return; } - - echo "Fetching job results..." >&2 - output_tmp=$(mktemp) - - while IFS=$'\t' read -r pr_number pr_title pr_url; do - local tmpdir - tmpdir=$(mktemp -d) - - if ! fetch_pr_results "${pr_number}" "${tmpdir}"; then - echo "PR #${pr_number}: incomplete job results, skipping" >&2 - rm -rf "${tmpdir}" - continue - fi - - local passed=0 failed=0 other=0 total=0 - for f in "${tmpdir}"/*.json; do - [[ -f "${f}" ]] || continue - local status - status=$(jq -r '.status' "${f}") - total=$((total + 1)) - case "${status}" in - SUCCESS) passed=$((passed + 1)) ;; - FAILURE) failed=$((failed + 1)) ;; - *) other=$((other + 1)) ;; - esac - done - rm -rf "${tmpdir}" - - jq -nc --argjson n "${pr_number}" --arg t "${pr_title}" --arg u "${pr_url}" \ - --argjson p "${passed}" --argjson f "${failed}" \ - --argjson o "${other}" --argjson to "${total}" \ - '{pr_number: $n, title: $t, url: $u, passed: $p, failed: $f, other: $o, total: $to}' \ - >> "${output_tmp}" - done < <(echo "${pr_data}" | jq -r '.[] | [.number, .title, .url] | @tsv') - - jq -s '.' "${output_tmp}" - rm -f "${output_tmp}" -} - -# Detail mode: JSON array of PRs with full job lists -mode_detail() { - local filter="${1:-}" author="${2:-}" - local pr_data output_tmp - - echo "Fetching open PRs..." >&2 - pr_data=$(fetch_open_prs "${filter}" "${author}") - [[ "$(echo "${pr_data}" | jq 'length')" -eq 0 ]] && { echo "[]"; return; } - - echo "Fetching job results..." >&2 - output_tmp=$(mktemp) - - while IFS=$'\t' read -r pr_number pr_title pr_url; do - local tmpdir - tmpdir=$(mktemp -d) - - if ! fetch_pr_results "${pr_number}" "${tmpdir}"; then - echo "PR #${pr_number}: incomplete job results, skipping" >&2 - rm -rf "${tmpdir}" - continue - fi - - local jobs_json - jobs_json=$(collect_jobs_json "${tmpdir}") - rm -rf "${tmpdir}" - - jq -nc --argjson n "${pr_number}" --arg t "${pr_title}" --arg u "${pr_url}" \ - --argjson jobs "${jobs_json}" \ - '{pr_number: $n, title: $t, url: $u, jobs: $jobs}' >> "${output_tmp}" - done < <(echo "${pr_data}" | jq -r '.[] | [.number, .title, .url] | @tsv') - - jq -s '.' "${output_tmp}" - rm -f "${output_tmp}" -} - -# Approve mode: add /lgtm to PRs where all jobs pass -mode_approve() { - local filter="${1:-}" author="${2:-}" - local pr_data - - echo "Fetching open PRs..." >&2 - pr_data=$(fetch_open_prs "${filter}" "${author}") - [[ "$(echo "${pr_data}" | jq 'length')" -eq 0 ]] && { echo "No open pull requests found."; return; } - - echo "Fetching job results..." >&2 - - while IFS=$'\t' read -r pr_number pr_title pr_url; do - local tmpdir - tmpdir=$(mktemp -d) - - if ! fetch_pr_results "${pr_number}" "${tmpdir}"; then - echo "PR #${pr_number}: incomplete job results, skipping" - rm -rf "${tmpdir}" - continue - fi - - local total=0 success=0 - for f in "${tmpdir}"/*.json; do - [[ -f "${f}" ]] || continue - local status - status=$(jq -r '.status' "${f}") - total=$((total + 1)) - [[ "${status}" == "SUCCESS" ]] && success=$((success + 1)) - done - rm -rf "${tmpdir}" - - if [[ "${total}" -eq 0 ]]; then - echo "PR #${pr_number}: No jobs found, skipping" - continue - fi - - if [[ "${success}" -eq "${total}" ]]; then - local comment=$'/lgtm\n/verified by ci\n' - comment+="${SIGNATURE}" - - echo "PR #${pr_number}: All ${total} jobs passed, approving..." - gh pr comment "${pr_number}" --repo "${GH_REPO}" --body "${comment}" - echo "PR #${pr_number}: Approved" - else - echo "PR #${pr_number}: ${success}/${total} jobs passed, skipping" - fi - done < <(echo "${pr_data}" | jq -r '.[] | [.number, .title, .url] | @tsv') -} - -# Restart mode: comment /test for each failed job -mode_restart() { - local filter="${1:-}" author="${2:-}" - local pr_data - - echo "Fetching open PRs..." >&2 - pr_data=$(fetch_open_prs "${filter}" "${author}") - [[ "$(echo "${pr_data}" | jq 'length')" -eq 0 ]] && { echo "No open pull requests found."; return; } - - echo "Fetching job results..." >&2 - - while IFS=$'\t' read -r pr_number pr_title pr_url; do - local tmpdir - tmpdir=$(mktemp -d) - - if ! fetch_pr_results "${pr_number}" "${tmpdir}"; then - echo "PR #${pr_number}: incomplete job results, skipping" - rm -rf "${tmpdir}" - continue - fi - - local failed_jobs=() - for f in "${tmpdir}"/*.json; do - [[ -f "${f}" ]] || continue - local job status - job=$(jq -r '.job' "${f}") - status=$(jq -r '.status' "${f}") - [[ "${status}" == "FAILURE" ]] && failed_jobs+=("${job}") - done - - if [[ ${#failed_jobs[@]} -eq 0 ]]; then - rm -rf "${tmpdir}" - echo "PR #${pr_number}: No failed jobs, skipping" - continue - fi - - rm -rf "${tmpdir}" - - # Fetch short /test names from prowjob.json for each failed job - local comment="" - for job in "${failed_jobs[@]}"; do - local build_id short_name - build_id=$(curl -s --max-time 10 "${GCS_BASE}/${GCS_PR_PREFIX}/${pr_number}/${job}/latest-build.txt" 2>/dev/null) || continue - short_name=$(curl -s --max-time 10 "${GCS_BASE}/${GCS_PR_PREFIX}/${pr_number}/${job}/${build_id}/prowjob.json" 2>/dev/null | \ - jq -r '.spec.rerun_command // empty' 2>/dev/null | sed 's|^/test ||') || short_name="" - short_name=$(echo "${short_name}" | xargs) - [[ -z "${short_name}" ]] && continue - comment+="/test ${short_name}"$'\n' - done - - if [[ -z "${comment}" ]]; then - echo "PR #${pr_number}: Could not resolve rerun commands for failed job(s), skipping" - continue - fi - comment+="${SIGNATURE}" - - echo "PR #${pr_number}: Restarting ${#failed_jobs[@]} failed job(s): ${failed_jobs[*]}" - gh pr comment "${pr_number}" --repo "${GH_REPO}" --body "${comment}" - echo "PR #${pr_number}: Restart comment posted" - done < <(echo "${pr_data}" | jq -r '.[] | [.number, .title, .url] | @tsv') -} - -usage() { - echo "Usage: ${0} [--mode MODE] [--filter STRING] [--author USER]" >&2 - echo " --mode MODE: Operation mode (default: summary)" >&2 - echo " summary: JSON array of PRs with pass/fail counts" >&2 - echo " detail: JSON array of PRs with full job lists" >&2 - echo " approve: Approve PRs where ALL test jobs passed" >&2 - echo " restart: Restart failed test jobs by commenting /test" >&2 - echo " --filter STRING: Only include PRs whose title contains STRING" >&2 - echo " --author USER: Only include PRs authored by USER" >&2 - exit 1 -} - -main() { - local mode="summary" - local filter="" - local author="" - - while [[ ${#} -gt 0 ]]; do - case "${1}" in - --mode) - [[ ${#} -lt 2 ]] && { echo "Error: mode requires an argument" >&2; usage; } - mode="${2}"; shift 2 ;; - --filter) - [[ ${#} -lt 2 ]] && { echo "Error: filter requires an argument" >&2; usage; } - filter="${2}"; shift 2 ;; - --author) - [[ ${#} -lt 2 ]] && { echo "Error: author requires an argument" >&2; usage; } - author="${2}"; shift 2 ;; - -*) echo "Unknown option: ${1}" >&2; usage ;; - *) echo "Unknown argument: ${1}" >&2; usage ;; - esac - done - - case "${mode}" in - summary) mode_summary "${filter}" "${author}" ;; - detail) mode_detail "${filter}" "${author}" ;; - approve) mode_approve "${filter}" "${author}" ;; - restart) mode_restart "${filter}" "${author}" ;; - *) echo "Error: Unknown mode '${mode}'" >&2; usage ;; - esac -} - -main "${@}" diff --git a/.claude/scripts/microshift-prow-jobs-for-release.sh b/.claude/scripts/microshift-prow-jobs-for-release.sh deleted file mode 100755 index f3cb91ea7c..0000000000 --- a/.claude/scripts/microshift-prow-jobs-for-release.sh +++ /dev/null @@ -1,62 +0,0 @@ -#!/bin/bash -set -euo pipefail - -# Prow Jobs Analyzer for MicroShift -# Output: JSON array of job objects on stdout -# Progress/errors: stderr - -PROW_URL="https://prow.ci.openshift.org/data.js" - -# Fetch all MicroShift jobs for a release, return latest run per job as JSON -fetch_latest_per_job() { - local release="${1}" - curl -s --max-time 60 "${PROW_URL}" | jq --arg release "${release}" ' - [.[] | select((.job | contains("microshift")) and (.job | contains($release)) - and .finished and .finished != "")] | - group_by(.job) | - map(sort_by(.started | tonumber) | reverse | first) | - [.[] | { - job: .job, - type: .type, - status: .state, - finished: .finished, - duration: .duration, - url: .url, - build_id: .build_id - }] - ' -} - -usage() { - echo "Usage: ${0} [--mode MODE] " >&2 - echo " --mode MODE: Operation mode (default: failed)" >&2 - echo " status: Latest run status for each job" >&2 - echo " failed: Only jobs with failure status" >&2 - echo " release: OpenShift release version (e.g., 4.22, main)" >&2 - exit 1 -} - -main() { - local mode="failed" - local release="" - - while [[ ${#} -gt 0 ]]; do - case "${1}" in - --mode) - [[ ${#} -lt 2 ]] && { echo "Error: mode requires an argument" >&2; usage; } - mode="${2}"; shift 2 ;; - -*) echo "Unknown option: ${1}" >&2; usage ;; - *) release="${1}"; shift ;; - esac - done - - [[ -z "${release}" ]] && { echo "Error: release argument is required" >&2; usage; } - - case "${mode}" in - status) fetch_latest_per_job "${release}" ;; - failed) fetch_latest_per_job "${release}" | jq '[.[] | select(.status == "failure")]' ;; - *) echo "Error: Unknown mode '${mode}'" >&2; usage ;; - esac -} - -main "${@}" diff --git a/.claude/settings.json b/.claude/settings.json index 992a563107..30cc311f65 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -5,19 +5,7 @@ "Write(//tmp/**)", "Bash(bash .claude/scripts/*)", "Bash(python3 .claude/scripts/*)", - "Bash(curl:*)", - "Bash(date:*)", - "Bash(cat:*)", - "Bash(echo:*)", - "Bash(wc:*)", - "Bash(ls:*)", - "Bash(jq:*)", - "Bash(gh pr list:*)", - "Bash(gh auth status:*)", - "WebFetch(domain:prow.ci.openshift.org)", - "Skill(analyze-ci:create-bugs)", - "Skill(analyze-ci:prow-job)", - "Skill(analyze-ci:doctor)" + "WebFetch(domain:prow.ci.openshift.org)" ], "deny": [], "ask": []