Skip to content

ci-config: integrate tls-scanner into fips-check-node-scan in parallel#79625

Closed
redhat-chai-bot wants to merge 2 commits into
openshift:mainfrom
redhat-chai-bot:chai-bot/fips-node-scan-tls-scanner-parallel
Closed

ci-config: integrate tls-scanner into fips-check-node-scan in parallel#79625
redhat-chai-bot wants to merge 2 commits into
openshift:mainfrom
redhat-chai-bot:chai-bot/fips-node-scan-tls-scanner-parallel

Conversation

@redhat-chai-bot
Copy link
Copy Markdown
Contributor

Summary

Integrate the TLS scanner directly into the fips-check-node-scan step so it runs in parallel with the existing FIPS node scan on the same cluster.

Supersedes #79624 (closed).

Changes

fips-check-node-scan-commands.sh

  • Adds start_tls_scanner() — launches a privileged TLS scanner pod (namespace tls-scanner, with cluster-admin and privileged SCC) before the existing node scan begins.
  • Adds collect_tls_scanner_results() — waits for the scanner to finish, copies artifacts (results.json, results.csv, junit_tls_scan.xml, scan.log) into ${ARTIFACT_DIR}/tls-scanner/, and cleans up the namespace.
  • The existing FIPS node scan logic is unchanged — it runs between the start and collect calls.
  • If PULL_SPEC_TLS_SCANNER_TOOL is not set, the TLS scan is skipped gracefully.
  • The TLS scanner exit code is intentionally ignored (informational only), matching the existing tls-scanner-run step behavior.

fips-check-node-scan-ref.yaml

  • Adds tls-scanner-tool dependency (PULL_SPEC_TLS_SCANNER_TOOL env var).
  • Increases timeout from default to 4h30m to accommodate the parallel TLS scan.
  • Updates documentation.

openshift-release-main__nightly-4.22.yaml

  • Adds tls-scanner-tool base image (tls-scanner/tls-scanner:tls-scanner-tool).

Context

Requested in Slack thread.

/cc @openshift/tls-scanner-maintainers

Modify the fips-check-node-scan step to launch the TLS scanner pod
early and run it in parallel with the existing FIPS node scan. The
TLS scanner deploys a privileged pod that enumerates all cluster pod
TLS ports and validates cipher suites and protocol versions.

Changes:
- fips-check-node-scan-commands.sh: add start_tls_scanner() and
  collect_tls_scanner_results() functions that bracket the existing
  node scan logic. The scanner pod is launched before the node scan
  and its artifacts are collected after the node scan completes.
- fips-check-node-scan-ref.yaml: add tls-scanner-tool dependency
  (PULL_SPEC_TLS_SCANNER_TOOL), increase timeout to 4h30m, update
  documentation.
- openshift-release-main__nightly-4.22.yaml: add tls-scanner-tool
  base image from tls-scanner/tls-scanner.

The TLS scanner exit code is intentionally ignored (informational
only), matching the existing tls-scanner-run step behavior. If
PULL_SPEC_TLS_SCANNER_TOOL is not set, the TLS scan is skipped
gracefully.
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@redhat-chai-bot: GitHub didn't allow me to request PR reviews from the following users: openshift/tls-scanner-maintainers.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

Summary

Integrate the TLS scanner directly into the fips-check-node-scan step so it runs in parallel with the existing FIPS node scan on the same cluster.

Supersedes #79624 (closed).

Changes

fips-check-node-scan-commands.sh

  • Adds start_tls_scanner() — launches a privileged TLS scanner pod (namespace tls-scanner, with cluster-admin and privileged SCC) before the existing node scan begins.
  • Adds collect_tls_scanner_results() — waits for the scanner to finish, copies artifacts (results.json, results.csv, junit_tls_scan.xml, scan.log) into ${ARTIFACT_DIR}/tls-scanner/, and cleans up the namespace.
  • The existing FIPS node scan logic is unchanged — it runs between the start and collect calls.
  • If PULL_SPEC_TLS_SCANNER_TOOL is not set, the TLS scan is skipped gracefully.
  • The TLS scanner exit code is intentionally ignored (informational only), matching the existing tls-scanner-run step behavior.

fips-check-node-scan-ref.yaml

  • Adds tls-scanner-tool dependency (PULL_SPEC_TLS_SCANNER_TOOL env var).
  • Increases timeout from default to 4h30m to accommodate the parallel TLS scan.
  • Updates documentation.

openshift-release-main__nightly-4.22.yaml

  • Adds tls-scanner-tool base image (tls-scanner/tls-scanner:tls-scanner-tool).

Context

Requested in Slack thread.

/cc @openshift/tls-scanner-maintainers

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Warning

Rate limit exceeded

@redhat-chai-bot has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 15 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e62b778d-0a7a-4888-8026-8a234c8386a7

📥 Commits

Reviewing files that changed from the base of the PR and between 3a84881 and 14e2b1e.

📒 Files selected for processing (1)
  • ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-commands.sh

Walkthrough

This PR adds parallel TLS scanner execution to the FIPS node scan workflow. A new base image entry registers the scanner container, the step definition declares the dependency and extends timeout to 4.5 hours, and the script implements scanner pod provisioning and asynchronous artifact collection across all exit paths.

Changes

TLS Scanner Parallel Execution

Layer / File(s) Summary
Step definition and base image configuration
ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-ref.yaml, ci-operator/config/openshift/release/openshift-release-main__nightly-4.22.yaml
Step metadata declares the tls-scanner-tool dependency, sets an explicit 4h30m0s timeout, and expands documentation to describe parallel TLS scanner behavior. Base image entry registers the tls-scanner-tool container from the tls-scanner namespace.
TLS scanner initialization and pod deployment
ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-commands.sh
The start_tls_scanner function creates a dedicated scanner namespace, binds cluster-admin and privileged SCC permissions, and deploys a privileged host-network/hostPID pod that runs the TLS scanner and drops a completion marker. Main flow calls this function early before architecture filtering.
TLS scanner result collection and main flow integration
ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-commands.sh
The collect_tls_scanner_results function polls for scanner pod completion (up to 4 hours), copies result artifacts and optional JUnit output to the artifact directory, lists collected artifacts, and deletes the scanner namespace. The main script calls this function at multiple exit points (unsupported architecture, disconnected environment, and after node scan execution) to ensure TLS artifacts are always collected before terminating.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • hongkailiu
  • fao89
  • wking
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: integrating the TLS scanner into the fips-check-node-scan step to run in parallel, which is the primary objective of this pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR contains no Ginkgo tests. All modified files are shell scripts and YAML CI configuration files, not test code. The check for stable test names is not applicable.
Test Structure And Quality ✅ Passed PR contains no Ginkgo test code; custom check for "Test Structure and Quality" is not applicable to shell scripts, YAML configs, and non-test Go files.
Microshift Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests. It modifies CI/CD configuration files (YAML) and a shell script for running FIPS checks, not test code. The custom check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR adds no Ginkgo e2e tests; it only modifies CI configuration and step registry shell scripts. The SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed The TLS scanner pod manifest has no topology-specific scheduling constraints (no nodeSelector, affinity, or spread constraints) and will work on all OpenShift topologies.
Ote Binary Stdout Contract ✅ Passed PR contains no Go code; only YAML configs and bash scripts. OTE Binary Stdout Contract applies to Go test binaries, not CI scripts or configuration files.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR does not add any Ginkgo e2e tests (It, Describe, Context, When); it only adds CI infrastructure shell scripts and YAML configurations, making this check not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

Hi @redhat-chai-bot. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: redhat-chai-bot
Once this PR has been reviewed and has the lgtm label, please assign neisw for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-commands.sh`:
- Around line 79-81: The script currently touches /results/scan.done then
unconditionally sleeps 120 (touch /results/scan.done and sleep 120), which can
let the container exit before the external collector runs oc cp; replace the
fixed sleep with a wait loop that keeps the container running until an external
signal file is present (e.g. replace sleep 120 with: while [ ! -f
/results/collect.done ]; do sleep 5; done) so the pod remains Running for oc cp;
ensure the collector writes /results/collect.done after copying so the loop can
exit.
- Around line 43-47: The startup currently aborts on early `oc` failures and
skips cleanup when the pod never becomes Ready; fix start_tls_scanner() by
making namespace creation best-effort: after running `oc create namespace ... |
oc apply -f -` set a new flag (e.g., TLS_SCANNER_NS_CREATED=true) immediately
even if later `oc` commands fail, avoid letting `oc` failures before this point
trigger an overall exit, and keep setting TLS_SCANNER_STARTED only when the pod
is Ready; add an EXIT trap that always runs cleanup to delete ${TLS_SCANNER_NS}
(and optionally remove the cluster-admin role and privileged SCC bindings)
regardless of TLS_SCANNER_STARTED, and update collect_tls_scanner_results() to
check TLS_SCANNER_NS_CREATED (not TLS_SCANNER_STARTED) to ensure namespace
deletion and RBAC/SCC rollback happen even if the scanner never became Ready.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1d988ccb-de1e-4428-921c-abea843b7d38

📥 Commits

Reviewing files that changed from the base of the PR and between 96c4691 and 3a84881.

📒 Files selected for processing (3)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22.yaml
  • ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-commands.sh
  • ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-ref.yaml

Comment thread ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-commands.sh Outdated
Comment thread ci-operator/step-registry/fips-check/node-scan/fips-check-node-scan-commands.sh Outdated
1. Cleanup leak: Introduce TLS_SCANNER_NS_CREATED flag and an EXIT
   trap (cleanup_tls_scanner) that always deletes the namespace,
   regardless of whether the pod became Ready. Make namespace creation
   and RBAC/SCC grants best-effort so failures skip the TLS scan
   instead of aborting the entire step.

2. oc cp flakiness: Replace the fixed 'sleep 120' in the scanner pod
   with a wait loop for /results/collect.done. The collector signals
   the pod after oc cp completes, so the container stays Running for
   as long as needed and exits cleanly afterward.
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@redhat-chai-bot: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-redhat-chaos-prow-scripts-main-4.20-nightly-gcp-fipsetcd-krkn-hub-node-tests redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-redhat-chaos-prow-scripts-main-4.18-nightly-gcp-fipsetcd-krkn-hub-node-tests redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-redhat-chaos-prow-scripts-main-4.21-nightly-gcp-fipsetcd-krkn-hub-node-tests redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-redhat-chaos-prow-scripts-main-4.19-nightly-upgrade-chaos-gcp-fipsetcd-loaded-upgrade-418to419-node-scenarios redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-redhat-chaos-prow-scripts-main-4.19-nightly-upgrade-chaos-gcp-fipsetcd-loaded-upgrade-418to419-pod-scenarios redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-redhat-chaos-prow-scripts-main-4.19-nightly-gcp-fipsetcd-krkn-hub-node-tests redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-redhat-chaos-prow-scripts-main-4.18-nightly-gcp-fipsetcd-krkn-hub-tests redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-redhat-chaos-prow-scripts-main-4.19-nightly-gcp-fipsetcd-krkn-hub-tests redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-redhat-chaos-prow-scripts-main-4.18-nightly-x86-cpou-upgrade-4.16-loaded-upgrade-416to418-gcp redhat-chaos/prow-scripts presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-main-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-release-4.22-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-release-4.21-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-release-4.20-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-release-4.19-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-release-4.18-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-release-4.16-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-release-4.14-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-rh-ecosystem-edge-recert-v0-e2e-aws-ovn-single-node-recert-openshift-e2e-test-qe rh-ecosystem-edge/recert presubmit Registry content changed
pull-ci-openshift-cluster-etcd-operator-release-4.12-e2e-gcp-qe-no-capabilities openshift/cluster-etcd-operator presubmit Registry content changed
pull-ci-openshift-cluster-etcd-operator-release-4.11-e2e-gcp-qe-no-capabilities openshift/cluster-etcd-operator presubmit Registry content changed
pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.21-nightly-x86-control-plane-fips-120nodes openshift-eng/ocp-qe-perfscale-ci presubmit Registry content changed
pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-5.0-nightly-x86-control-plane-fips-120nodes openshift-eng/ocp-qe-perfscale-ci presubmit Registry content changed
pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.22-nightly-x86-control-plane-fips-120nodes openshift-eng/ocp-qe-perfscale-ci presubmit Registry content changed
pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.23-nightly-x86-control-plane-fips-120nodes openshift-eng/ocp-qe-perfscale-ci presubmit Registry content changed
pull-ci-openshift-eng-ocp-qe-perfscale-ci-main-gcp-5.0-nightly-x86-control-plane-etcd-fips-24nodes openshift-eng/ocp-qe-perfscale-ci presubmit Registry content changed

A total of 2693 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@redhat-chai-bot
Copy link
Copy Markdown
Contributor Author

Closing in favor of a new approach: creating a dedicated release controller informing job using the existing tls-scanner-run step ref, per feedback from the requester.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant