Skip to content

NE-2032: e2e: Signal service deprovisioning issues during Gateway DNS test#1220

Open
alebedev87 wants to merge 1 commit intoopenshift:masterfrom
alebedev87:dnsrecord-listener-update
Open

NE-2032: e2e: Signal service deprovisioning issues during Gateway DNS test#1220
alebedev87 wants to merge 1 commit intoopenshift:masterfrom
alebedev87:dnsrecord-listener-update

Conversation

@alebedev87
Copy link
Copy Markdown
Contributor

@alebedev87 alebedev87 commented Apr 15, 2025

This PR adds a check for the service associated with a deleted gateway. The test now waits for the service to be fully removed and logs a warning if it persists beyond a specified timeout. This helps identify delays in deprovisioning of cloud load balancers backing gateway services.

@openshift-ci openshift-ci Bot requested review from grzpiotrowski and knobunc April 15, 2025 00:01
@alebedev87 alebedev87 changed the title e2e: Signal service deprovisioning problems during Gateway DNS test [WIP] e2e: Signal service deprovisioning problems during Gateway DNS test Apr 15, 2025
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2025
@alebedev87 alebedev87 force-pushed the dnsrecord-listener-update branch from 301e0bc to a079a5a Compare April 15, 2025 09:25
@melvinjoseph86
Copy link
Copy Markdown

/retest-required

1 similar comment
@melvinjoseph86
Copy link
Copy Markdown

/retest-required

@alebedev87 alebedev87 force-pushed the dnsrecord-listener-update branch from a079a5a to 849cdc8 Compare May 13, 2025 15:15
@alebedev87 alebedev87 changed the title [WIP] e2e: Signal service deprovisioning problems during Gateway DNS test e2e: Signal service deprovisioning issues during Gateway DNS test May 13, 2025
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2025
@alebedev87 alebedev87 changed the title e2e: Signal service deprovisioning issues during Gateway DNS test NE-2032: e2e: Signal service deprovisioning issues during Gateway DNS test May 13, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 13, 2025
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 13, 2025

@alebedev87: This pull request references NE-2032 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

Details

In response to this:

This PR adds a check for the service associated with a deleted gateway. The test now waits for the service to be fully removed and logs a warning if it persists beyond a specified timeout. This helps identify delays in deprovisioning of cloud load balancers backing gateway services.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@alebedev87
Copy link
Copy Markdown
Contributor Author

/assign @grzpiotrowski

@alebedev87
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@alebedev87
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 15, 2025
@alebedev87
Copy link
Copy Markdown
Contributor Author

/remove-lifecycle stale

@openshift-ci openshift-ci Bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2025
Comment thread test/e2e/gateway_api_test.go Outdated

if err := kclient.Delete(context.TODO(), gateway); err != nil {
t.Errorf("failed to delete gateway %q: %v", gateway.Name, err)
t.Fatalf("Failed to delete gateway %q: %v", gateway.Name, err)
Copy link
Copy Markdown
Member

@rikatz rikatz Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we care about the same problems of networking here? eg.: do you want to retry the delete also?

Additionally, if you do add this delete to the retry function below, it is worth ignoring the error if the object does not exist, as previous loop may have deleted it.

One more comment, out of this change but above on line 646: I would add a RetryOnConflict for that update/patch, as you may have other controllers (GatewayAPI/OSSM) changing it, the Update may fail with a conflict. It would be good to ignore and retry the update

Edit: OTOH it may lead to a false negative if you try to get a service name that doesn't match the gateway name, as it will be not found

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with a rebase from master.

Comment thread test/e2e/gateway_api_test.go Outdated

// The load balancer deprovisioning can take some time.
// Signal a long deprovisioning to help distinguish it from DNS management problems.
gtwSvcName := types.NamespacedName{Namespace: "openshift-ingress", Name: "test-gateway-update-openshift-default"}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the service name is derived from the Gateway name, so instead of calling it "test-gateway-update-openshift-default" do you want to compose the name from the gateway name? This way in case of some change on some logic/naming it will not break the test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2025
@openshift-merge-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

1 similar comment
@openshift-merge-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 1, 2026
@openshift-bot
Copy link
Copy Markdown
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 1, 2026
@melvinjoseph86
Copy link
Copy Markdown

/remove-lifecycle rotten

@openshift-ci openshift-ci Bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 2, 2026
@alebedev87 alebedev87 force-pushed the dnsrecord-listener-update branch from 849cdc8 to f424fcc Compare April 3, 2026 12:10
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

In testGatewayAPIDNSListenerUpdate the test now logs successful Gateway deletion and then polls (every 3s, up to 3 minutes) to verify the associated corev1.Service named <gateway.Name>-<gatewayClass.Name> in operatorcontroller.DefaultOperandNamespace is removed. During polling, Kubernetes NotFound is treated as success; other errors or continued existence cause retries with log messages. If polling times out the test logs a timeout message. After the service-deletion wait, the test continues to validate that the remaining DNSRecord entry is removed.

Changes

Cohort / File(s) Summary
Test Cleanup Enhancement
test/e2e/gateway_api_test.go
After calling deleteWithRetryOnError for the Gateway, the test now logs successful deletion and adds polling (3s interval, 3-minute timeout) to wait for the associated corev1.Service (<gateway.Name>-<gatewayClass.Name> in operatorcontroller.DefaultOperandNamespace) to be fully removed. Polling treats NotFound as success; other transient errors or existing Service cause retries and log messages; a timeout is logged if the Service still exists.

Estimated code review effort

⚠️ Medium | ⏱️ ~20–30 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@alebedev87 alebedev87 force-pushed the dnsrecord-listener-update branch from f424fcc to 36c7332 Compare April 3, 2026 12:12
This adds a check for the service associated with a deleted gateway.
The test now waits for the service to be fully removed and
logs a warning if it persists beyond a specified timeout.
This helps identify delays in deprovisioning of cloud load balancers
backing gateway services.
@alebedev87 alebedev87 force-pushed the dnsrecord-listener-update branch from 36c7332 to 6352f79 Compare April 3, 2026 12:16
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/e2e/gateway_api_test.go (1)

1159-1166: ⚠️ Potential issue | 🟠 Major

Use the controller naming helper to prevent false-positive service-deletion checks.

On Line 1159, manually composing the Service name can drift from controller logic. Since NotFound is treated as success, a naming mismatch would make this poll pass immediately and hide the deprovisioning issue this test is meant to detect.

Proposed fix
-	gtwSvcName := types.NamespacedName{Namespace: operatorcontroller.DefaultOperandNamespace, Name: gateway.Name + "-" + gatewayClass.Name}
+	gtwSvcName := operatorcontroller.LoadBalancerServiceNameFromGatewayName(gateway.Name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/gateway_api_test.go` around lines 1159 - 1166, The test builds
gtwSvcName by concatenating gateway.Name + "-" + gatewayClass.Name which can
diverge from the controller's naming scheme and cause false-positive deletion
success in the wait.PollUntilContextTimeout check; replace the manual
concatenation with the controller's canonical naming helper (the function used
by the controller to compute the operand Service name) to construct gtwSvcName
so the poll checks the real service the controller creates (update the code that
sets gtwSvcName and keep the rest of the polling logic unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1159-1166: The test builds gtwSvcName by concatenating
gateway.Name + "-" + gatewayClass.Name which can diverge from the controller's
naming scheme and cause false-positive deletion success in the
wait.PollUntilContextTimeout check; replace the manual concatenation with the
controller's canonical naming helper (the function used by the controller to
compute the operand Service name) to construct gtwSvcName so the poll checks the
real service the controller creates (update the code that sets gtwSvcName and
keep the rest of the polling logic unchanged).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: de36da67-963a-4141-8054-1bdcf9774b8c

📥 Commits

Reviewing files that changed from the base of the PR and between 36c7332 and 6352f79.

📒 Files selected for processing (1)
  • test/e2e/gateway_api_test.go

@alebedev87
Copy link
Copy Markdown
Contributor Author

/retest-required

Copy link
Copy Markdown
Contributor

@grzpiotrowski grzpiotrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 8, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: grzpiotrowski

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2026
@grzpiotrowski
Copy link
Copy Markdown
Contributor

Failures in e2e-aws-operator-techpreview unrelated to this PR. Fix under way in PR #1408 I believe.

@alebedev87
Copy link
Copy Markdown
Contributor Author

/verified by CI

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@alebedev87: This PR has been marked as verified by CI.

Details

In response to this:

/verified by CI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD b408b27 and 2 for PR HEAD 6352f79 in total

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2c5b4ef and 1 for PR HEAD 6352f79 in total

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD f40071e and 0 for PR HEAD 6352f79 in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 20, 2026

@alebedev87: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial 849cdc8 link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-operator-techpreview 6352f79 link false /test e2e-aws-operator-techpreview
ci/prow/e2e-aws-ovn 6352f79 link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/hold

Revision 6352f79 was retested 3 times: holding

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants