Skip to content

sandboxed containers: Add wait loop for CatalogSource readiness in env-cm step#79608

Open
tbuskey wants to merge 1 commit into
openshift:mainfrom
tbuskey:env-cm-wait
Open

sandboxed containers: Add wait loop for CatalogSource readiness in env-cm step#79608
tbuskey wants to merge 1 commit into
openshift:mainfrom
tbuskey:env-cm-wait

Conversation

@tbuskey
Copy link
Copy Markdown
Contributor

@tbuskey tbuskey commented May 21, 2026

env-cm creates brew-catalog CatalogSource but doesn't wait for it to be ready. If the catsrc isn't ready, subscriptions can fail.

This adds wait_for_catsrc() function that polls for READY state (120s timeout, 5s intervals).

Summary by CodeRabbit

This PR improves the reliability of the sandboxed-containers-operator's CI testing pipeline by ensuring that a CatalogSource is fully ready before dependent operations attempt to use it.

What changed

The env-cm step in the sandboxed-containers-operator CI configuration adds a new wait_for_catsrc() function that:

  • Polls the CatalogSource resource in the openshift-marketplace namespace until its .status.connectionState.lastObservedState field indicates READY
  • Retries for up to 120 seconds (24 iterations with 5-second intervals between checks)
  • On timeout, prints a warning and outputs the CatalogSource YAML for debugging, then exits with status code 1
  • Prevents subsequent steps from proceeding until the CatalogSource is actually ready to serve operators

The step now invokes this wait function immediately after creating the CatalogSource in the Pre-GA test flow, closing a race condition where subscriptions could fail if they were created before the CatalogSource finished initializing.

Impact

This affects CI testing of the sandboxed-containers-operator in Pre-GA release scenarios. The change makes the test environment setup more robust by explicitly gating subsequent steps on the CatalogSource readiness, rather than relying on implicit timing assumptions or downstream retry mechanisms.

env-cm creates brew-catalog CatalogSource but doesn't wait
for it to be ready. If the catsrc isn't ready, subscriptions can fail.

This adds wait_for_catsrc() function that polls for READY state
(120s timeout, 5s intervals).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Tom Buskey <tbuskey@redhat.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Walkthrough

A new wait_for_catsrc Bash function polls an OpenShift CatalogSource resource until its status reaches READY. It checks .status.connectionState.lastObservedState using oc get, retries up to 24 times with 5-second intervals, logs warnings and YAML output on timeout, then exits with status 1. The Pre-GA workflow now calls this function after creating the CatalogSource to ensure readiness before proceeding.

Changes

CatalogSource readiness polling

Layer / File(s) Summary
CatalogSource readiness polling function and integration
ci-operator/step-registry/sandboxed-containers-operator/env-cm/sandboxed-containers-operator-env-cm-commands.sh
A new wait_for_catsrc(catsrc_name) function is defined to poll the CatalogSource status until READY, with 24 retry attempts and 5-second intervals. On timeout, it logs details and exits with status 1. The function is then invoked in the Pre-GA flow immediately after CatalogSource creation.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a wait loop for CatalogSource readiness in the env-cm step, which is the primary purpose of the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies a Bash shell script for CI operations, not Ginkgo tests. The custom check for stable/deterministic test names is not applicable to operational scripts.
Test Structure And Quality ✅ Passed Check not applicable: PR modifies a Bash shell script, not Ginkgo test code. Check requires reviewing Ginkgo test structure (It blocks, BeforeEach/AfterEach, Eventually timeouts).
Microshift Test Compatibility ✅ Passed This PR modifies only a bash shell script for CI infrastructure, not Ginkgo e2e tests. The custom check applies only to new e2e tests, so it is not applicable here.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Custom check is for Ginkgo e2e test SNO compatibility. This PR only modifies a Bash CI operator step script with no e2e test code, so the check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds a Bash polling function in a CI script. No deployment manifests, operators, controllers, or topology-aware scheduling constraints introduced.
Ote Binary Stdout Contract ✅ Passed PR modifies a Bash shell script in CI step-registry, not Go code. OTE Binary Stdout Contract applies only to Go binaries, not shell scripts.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR modifies a Bash CI script, not Ginkgo e2e tests. The custom check targets only new Ginkgo test files; it does not apply here.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tbuskey
Copy link
Copy Markdown
Contributor Author

tbuskey commented May 21, 2026

/assign @wainersm
/assign @ldoktor
/assign @vvoronko

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tbuskey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2026
@openshift-ci openshift-ci Bot requested review from vvoronko and wainersm May 21, 2026 14:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ci-operator/step-registry/sandboxed-containers-operator/env-cm/sandboxed-containers-operator-env-cm-commands.sh (1)

1-2: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

wait_for_catsrc failure is not guaranteed to stop the step.

If readiness times out, wait_for_catsrc returns non-zero, but without set -euo pipefail (or an explicit || return/exit), the script can continue and still run downstream steps.

Suggested fix
 #!/bin/bash
+set -euo pipefail
@@
-  wait_for_catsrc "${CATALOG_SOURCE_NAME}"
+  wait_for_catsrc "${CATALOG_SOURCE_NAME}"

(If you prefer not to rely on -e, then use:)

-  wait_for_catsrc "${CATALOG_SOURCE_NAME}"
+  wait_for_catsrc "${CATALOG_SOURCE_NAME}" || exit 1

As per coding guidelines: "Step registry script files must use set -euo pipefail (without -x) as default and only enable -x when actively debugging".

Also applies to: 125-125

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/sandboxed-containers-operator/env-cm/sandboxed-containers-operator-env-cm-commands.sh`
around lines 1 - 2, Add strict shell failure handling to this step script by
enabling "set -euo pipefail" at the top (immediately after the shebang) so that
a non-zero return from wait_for_catsrc will stop the step; if you prefer to
avoid global -e, ensure every call to wait_for_catsrc (and similar
readiness/check helpers) is followed with an explicit "|| exit 1" to fail fast.
Update the top of the script and any invocation sites of wait_for_catsrc to
guarantee the step aborts on timeout/failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/sandboxed-containers-operator/env-cm/sandboxed-containers-operator-env-cm-commands.sh`:
- Line 91: The oc get call used to populate the local variable state (local
state=$(oc get catalogsource -n openshift-marketplace "${catsrc_name}" -o
jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null || echo ""))
can hang and bypass the overall 120s retry budget; modify that oc get invocation
to include a per-request timeout (e.g. --request-timeout=10s) so each poll
attempt is bounded and will fail fast within the loop's retry budget.

---

Outside diff comments:
In
`@ci-operator/step-registry/sandboxed-containers-operator/env-cm/sandboxed-containers-operator-env-cm-commands.sh`:
- Around line 1-2: Add strict shell failure handling to this step script by
enabling "set -euo pipefail" at the top (immediately after the shebang) so that
a non-zero return from wait_for_catsrc will stop the step; if you prefer to
avoid global -e, ensure every call to wait_for_catsrc (and similar
readiness/check helpers) is followed with an explicit "|| exit 1" to fail fast.
Update the top of the script and any invocation sites of wait_for_catsrc to
guarantee the step aborts on timeout/failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0364a09b-5e01-4aa9-a5cd-220db599dd6b

📥 Commits

Reviewing files that changed from the base of the PR and between 942220c and 739d494.

📒 Files selected for processing (1)
  • ci-operator/step-registry/sandboxed-containers-operator/env-cm/sandboxed-containers-operator-env-cm-commands.sh


local ready=false
for i in {1..24}; do
local state=$(oc get catalogsource -n openshift-marketplace "${catsrc_name}" -o jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null || echo "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a request timeout to oc get inside the polling loop.

A stuck API call can block forever and bypass the intended 120s retry budget. Add --request-timeout so each poll attempt is bounded.

Suggested fix
-    local state=$(oc get catalogsource -n openshift-marketplace "${catsrc_name}" -o jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null || echo "")
+    local state
+    state="$(oc --request-timeout=10s get catalogsource -n openshift-marketplace "${catsrc_name}" -o jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null || echo "")"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local state=$(oc get catalogsource -n openshift-marketplace "${catsrc_name}" -o jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null || echo "")
local state
state="$(oc --request-timeout=10s get catalogsource -n openshift-marketplace "${catsrc_name}" -o jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null || echo "")"
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 91-91: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/sandboxed-containers-operator/env-cm/sandboxed-containers-operator-env-cm-commands.sh`
at line 91, The oc get call used to populate the local variable state (local
state=$(oc get catalogsource -n openshift-marketplace "${catsrc_name}" -o
jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null || echo ""))
can hang and bypass the overall 120s retry budget; modify that oc get invocation
to include a per-request timeout (e.g. --request-timeout=10s) so each poll
attempt is bounded and will fail fast within the loop's retry budget.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@tbuskey: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aro-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aro-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-azure-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aro-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aro-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aws-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-aro-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-azure-ipi-kata N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-aro-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-azure-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-aws-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-azure-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-aws-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-aro-ipi-coco N/A periodic Registry content changed

A total of 49 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here
Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@tbuskey: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/step-registry-shellcheck 739d494 link true /test step-registry-shellcheck

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants