Skip to content

USHIFT-6933: MicroShift CI Doctor: Add fix-test-bugs command in dry-run mode#79951

Merged
openshift-merge-bot[bot] merged 4 commits into
openshift:mainfrom
ggiguash:ci-doctor-fix-bugs-dry-run
Jun 2, 2026
Merged

USHIFT-6933: MicroShift CI Doctor: Add fix-test-bugs command in dry-run mode#79951
openshift-merge-bot[bot] merged 4 commits into
openshift:mainfrom
ggiguash:ci-doctor-fix-bugs-dry-run

Conversation

@ggiguash

@ggiguash ggiguash commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

This PR updates OpenShift CI configuration and the MicroShift CI Doctor step to add a dry-run mode for the "fix-test-bugs" command, improve logging/notifications, and increase the job timeout for the doctor step. The changes affect the edge-tooling CI jobs and the microshift-ci-doctor step implementation in the openshift/release repository.

What changed in practical terms

  • CI job notifications: The microshift-ci-doctor job's Slack report template (ci-operator config) was converted to a multi-line pipe-style template, appending an explicit newline after the "ended with …" message and placing artifact links (including a dedicated "View logs" line) on separate lines to improve readability in the team notification channel.
  • MicroShift CI Doctor script behavior:
    • Adds a dry-run "fix test bugs" step that runs Claude in non-destructive mode and tees its output to a new CLAUDE_FIX_TEST_BUGS_LOG so results can be inspected without mutating PRs or CI state.
    • Integrates CLAUDE_FIX_TEST_BUGS_LOG into the atexit_handler validation so the script verifies a Claude "result" event and a success status for that run.
    • Introduces check_claude_rc(rc, session, timeout_min) to treat Claude timeout (exit code 124) and other non-zero exits as hard failures and to emit clearer timeout/failure messages.
    • Broadens Claude permission configuration by replacing an explicit microshift-ci skills list with a wildcard Skill(microshift-ci:*).
    • Reorders workflow steps: duplicate rebase PRs are closed prior to analysis; the earlier automatic create-bugs invocation was modified (removed --auto); a later duplicate-closing block was removed so the script proceeds to restart failed rebase PR tests after the dry-run step.
    • Improves logging/messages around Claude timeouts and failures.

Job runtime/configuration change

  • The doctor job/container timeout was increased from 1h0m0s to 1h30m0s (ci-operator step-ref YAML) to accommodate the added steps and longer runs.

Files changed (high-level)

  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
    • Converted report_template to multi-line pipe format and split artifact links across lines.
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh
    • Added CLAUDE_FIX_TEST_BUGS_LOG, check_claude_rc(), dry-run fix-test-bugs invocation, adjusted Claude permissions and workflow ordering, and integrated exit validation changes.
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml
    • Increased job/container timeout from 1h to 1.5h.

Why this matters

  • Maintainers can preview "fix-test-bugs" output safely using dry-run before making changes to PRs or CI state.
  • Slack reports are more readable.
  • Timeouts and Claude failures are surfaced with clearer messages, aiding troubleshooting.

Additional notes

  • Commit message mentions "Update job runtime limits", which aligns with the job timeout increase in the step-ref YAML.

@openshift-ci-robot

openshift-ci-robot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

@ggiguash: This pull request references USHIFT-6933 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 2, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2026
@openshift-ci

openshift-ci Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d2c77e87-2457-4529-9f79-174400288b5c

📥 Commits

Reviewing files that changed from the base of the PR and between c24510d and b2865c9.

📒 Files selected for processing (2)
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml
✅ Files skipped from review due to trivial changes (1)
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh

Walkthrough

Reformats the microshift-ci-doctor report template, adds a new Claude "fix test bugs" dry-run (with log and exit-code checks), broadens Claude permissions, reorders script steps to close duplicate rebase PRs earlier and proceed directly to restarting failed rebase tests, and extends the job timeout.

Changes

Microshift CI Doctor

Layer / File(s) Summary
Report template formatting
ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
The microshift-ci-doctor job's report_template is reformatted from a single-line pipe to a multiline pipe block, adding an explicit newline after the status message and placing artifact links (report, bugs, PRs) and View logs on separate lines.
Claude logging, checks, and permission
ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh
Adds CLAUDE_FIX_TEST_BUGS_LOG, extends atexit_handler to validate that log, introduces check_claude_rc to fail on timeout (124) or non-zero exits, and replaces enumerated microshift-ci:* skills with Skill(microshift-ci:*).
Fix test bugs step and workflow reorder
ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh
Inserts an early duplicate-rebase PR closing step before analysis, removes --auto from the create-bugs call, adds a dry-run fix test bugs Claude invocation that opens fixes and tees to CLAUDE_FIX_TEST_BUGS_LOG, and removes the later duplicate-closure so the script continues to restart failed rebase tests.
Job timeout
ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml
Increases the step timeout from 1h0m0s to 1h30m0s.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • bfournie
  • smg247
  • neisw

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Script passes JIRA_API_TOKEN and JIRA_USERNAME to Claude MCP with MCP_VERBOSE=true enabled, and all logs are copied to artifacts, risking credential exposure in public logs. Disable MCP_VERBOSE or use sensitive redaction filters; exclude or redact MCP logs from artifacts, or route MCP logs to a secure location not copied to public artifacts.
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a dry-run mode for the fix-test-bugs command in MicroShift CI Doctor, which is the primary feature introduced in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies only CI configuration YAML and bash script files; no Ginkgo tests are present in modified files, so check is not applicable.
Test Structure And Quality ✅ Passed The custom check for Ginkgo test code quality is not applicable to this PR, which contains only YAML CI configuration files and shell scripts with no test code.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The changes are CI infrastructure modifications only (YAML configs and bash scripts), not test code requiring MicroShift compatibility review.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR contains no Ginkgo e2e tests. All changes are to CI/CD infrastructure files (YAML configs, bash scripts for job automation). The SNO compatibility check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only CI/CD configuration and tooling changes (job configs, step registry refs, shell scripts), not deployment manifests, operator code, or controllers—check is not applicable.
Ote Binary Stdout Contract ✅ Passed PR contains only YAML CI config and shell scripts; no OTE binaries or Go code with Ginkgo tests subject to the stdout contract check.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR does not add any Ginkgo e2e tests; all changes are CI configuration and shell scripts. Custom check for IPv6/disconnected network compatibility is not applicable.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or unsafe secret/token comparisons detected in the three modified files.
Container-Privileges ✅ Passed No privileged container settings (privileged: true, hostPID, hostNetwork, hostIPC, SYS_ADMIN, allowPrivilegeEscalation, or root without justification) found in any modified files.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh (1)

183-183: 💤 Low value

Consider documenting the intent behind wildcard permissions.

The change from explicit skill allowlisting to Skill(microshift-ci:*) broadens what the Claude agent can invoke. While scoped to the microshift-ci: namespace, explicit allowlisting provides better visibility into expected capabilities and aligns with least-privilege principles.

If this is intentional to simplify future skill additions, a brief inline comment explaining the rationale would help future reviewers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh`
at line 183, The wildcard permission "Skill(microshift-ci:*)" broadens allowed
Claude agent skills; either replace it with an explicit allowlist of intended
skills (e.g., list each "Skill(microshift-ci:someAction)" currently required) to
maintain least-privilege, or keep the wildcard but add a brief inline comment
immediately above the "Skill(microshift-ci:*)" entry explaining the intentional
scope, why wildcarding is needed for future additions, and acknowledging the
security tradeoff so future reviewers understand the rationale.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh`:
- Line 183: The wildcard permission "Skill(microshift-ci:*)" broadens allowed
Claude agent skills; either replace it with an explicit allowlist of intended
skills (e.g., list each "Skill(microshift-ci:someAction)" currently required) to
maintain least-privilege, or keep the wildcard but add a brief inline comment
immediately above the "Skill(microshift-ci:*)" entry explaining the intentional
scope, why wildcarding is needed for future additions, and acknowledging the
security tradeoff so future reviewers understand the rationale.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d2a0cd94-9583-4216-ae3a-794dec11c750

📥 Commits

Reviewing files that changed from the base of the PR and between 98d1e72 and 5ba4902.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (2)
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh

@ggiguash

ggiguash commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ggiguash: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@ggiguash

ggiguash commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ggiguash: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@ggiguash

ggiguash commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ggiguash: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@ggiguash ggiguash marked this pull request as ready for review June 2, 2026 07:31
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2026
@openshift-ci openshift-ci Bot requested review from copejon and jeff-roche June 2, 2026 07:32
@ggiguash

ggiguash commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ggiguash: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@ggiguash: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor N/A periodic Ci-operator config changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@pmtk

pmtk commented Jun 2, 2026

Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2026
@openshift-ci

openshift-ci Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ggiguash

ggiguash commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse ack

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ggiguash: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jun 2, 2026
@openshift-ci

openshift-ci Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

@ggiguash: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 9e6aa4e into openshift:main Jun 2, 2026
19 checks passed
@ggiguash ggiguash deleted the ci-doctor-fix-bugs-dry-run branch June 2, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants