NO-JIRA: E2E: Preserve CVO state when bypassing VAP#1368
NO-JIRA: E2E: Preserve CVO state when bypassing VAP#1368gcs278 wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
@gcs278: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a helper to read current CVO replica count and extends vapManager with 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/util_gatewayapi_test.go`:
- Around line 1227-1231: When getCVOReplicas returns an error we currently
return (err, nil) which causes callers like bypassVAP to call defer recoverFn()
and panic; change the error return to provide a no-op recover function instead
of nil so callers can safely defer it. Update the call site in
test/e2e/util_gatewayapi_test.go where getCVOReplicas is invoked (the variable
names replicas, err, recoverFn) to return (err, func(){}) on error, matching the
other error path at line 1235 and avoiding a nil deferred function.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
test/e2e/util_gatewayapi_test.go
489d5f7 to
821191b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/e2e/util_gatewayapi_test.go (1)
1228-1233:⚠️ Potential issue | 🔴 CriticalFix panic path: do not return a nil recover function on error.
At Line 1231,
return err, nilcan still panic becausebypassVAPdefersrecoverFn()whenerr != nil.🐛 Proposed fix
replicas, err := getCVOReplicas(m.t) if err != nil { - return err, nil + return err, func() {} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util_gatewayapi_test.go` around lines 1228 - 1233, The panic occurs because bypassVAP defers recoverFn() even when getCVOReplicas returns an error and the code returns a nil recover function; change the error return to supply a non-nil no-op recover function instead of nil (e.g. return err, func() {}), or initialize a default no-op recover function before calling getCVOReplicas and only replace it on success; reference getCVOReplicas and m.initialCVOReplicas to locate the code to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/util_gatewayapi_test.go`:
- Around line 1271-1286: The restored VAP object (m.savedVAP) must be
deep-copied and have all server-managed metadata cleared before calling
kclient.Create to avoid API rejections; replace the current single-field
mutation of ResourceVersion with a DeepCopy() of m.savedVAP and set metadata
fields such as UID, ResourceVersion, CreationTimestamp, Generation, SelfLink and
ManagedFields to empty/zero values (preserving any required labels/annotations
you intend to keep) then pass that sanitized copy to kclient.Create; ensure you
still log on success/failure as currently done.
---
Duplicate comments:
In `@test/e2e/util_gatewayapi_test.go`:
- Around line 1228-1233: The panic occurs because bypassVAP defers recoverFn()
even when getCVOReplicas returns an error and the code returns a nil recover
function; change the error return to supply a non-nil no-op recover function
instead of nil (e.g. return err, func() {}), or initialize a default no-op
recover function before calling getCVOReplicas and only replace it on success;
reference getCVOReplicas and m.initialCVOReplicas to locate the code to update.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
test/e2e/util_gatewayapi_test.go
| // Restore VAP if we saved it | ||
| if m.savedVAP != nil { | ||
| // Clear resource version to allow recreation | ||
| m.savedVAP.ResourceVersion = "" | ||
| if err := kclient.Create(context.Background(), m.savedVAP); err != nil { | ||
| if !kerrors.IsAlreadyExists(err) { | ||
| m.t.Errorf("failed to restore vap %q: %v", m.name, err) | ||
| } | ||
| } else { | ||
| m.t.Logf("Restored VAP %q", m.name) | ||
| } | ||
| } | ||
|
|
||
| // Scale CVO back up | ||
| if err := scaleDeployment(m.t, cvoNamespace, cvoDeploymentName, m.initialCVOReplicas); err != nil { | ||
| m.t.Errorf("failed to scale up cvo: %v", err) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Kubernetes ValidatingAdmissionPolicy CREATE API requirements server-managed fields uid resourceVersion
💡 Result:
For a CREATE (POST /apis/admissionregistration.k8s.io/v1/validatingadmissionpolicies) request, you should send a ValidatingAdmissionPolicy object that includes only client-settable fields. In particular:
Do not set metadata.uid or metadata.resourceVersion on CREATE
metadata.uid: Populated by the system, read-only (typically generated by the apiserver on successful creation). [3]metadata.resourceVersion: Populated by the system, read-only (used for optimistic concurrency / change detection; treat as opaque). [3][4]
(Practically: omit them from your YAML/JSON; the apiserver returns them in the CREATE response.)
What you do need on CREATE (minimum)
apiVersion: admissionregistration.k8s.io/v1kind: ValidatingAdmissionPolicymetadata.name(cluster-scoped, so no namespace)specwith at least:spec.matchConstraints.resourceRules[](each rule requiresapiGroups,apiVersions,operations,resources) [2]spec.validations[]with at least one item, and each item requiresexpression[2]- Also:
ValidationsandauditAnnotationscannot both be empty (you need at least one of them). [2]
Example shape is shown in the official docs. [1]
When resourceVersion is relevant
You typically supply metadata.resourceVersion when doing a full-object update (PUT) to avoid write conflicts (optimistic concurrency), not when creating a new object. [4]
Sources: [1] [2] [3] [4]
🌐 Web query:
Kubernetes ObjectMeta fields that must be cleared before Create() operation
💡 Result:
Before calling Create() (POST) you should clear/omit the apiserver-managed / read-only ObjectMeta fields (common when you “clone” an object you previously got from the API):
metadata.uid(system-generated; read-only) [2]metadata.resourceVersion(system-populated; read-only) [2]metadata.creationTimestamp(clients may not set; system-populated) [3]metadata.deletionTimestamp(set by server during deletion; not directly settable by clients) [4]metadata.deletionGracePeriodSeconds(read-only; only set along withdeletionTimestamp) [4]metadata.managedFields(tracked/managed by the apiserver; avoid setting) [5]metadata.selfLink(legacy read-only field; no longer populated) [2]
Typically you keep metadata.name (or generateName), metadata.namespace, labels, annotations, finalizers, and ownerReferences as needed.
Sources: [2] [3] [4] [5]
Sanitize restored VAP metadata before Create to avoid API rejection.
At line 1274, only ResourceVersion is cleared. Kubernetes server-managed fields like uid, creationTimestamp, and managedFields must also be cleared before calling Create() on a fetched object—the API will reject them as invalid. This is especially critical when initial CVO replicas are 0, as the CVO won't be available to self-heal.
Use DeepCopy() to avoid mutating the original object, then clear all server-managed metadata:
🔧 Proposed fix
if m.savedVAP != nil {
- // Clear resource version to allow recreation
- m.savedVAP.ResourceVersion = ""
- if err := kclient.Create(context.Background(), m.savedVAP); err != nil {
+ restored := m.savedVAP.DeepCopy()
+ restored.ResourceVersion = ""
+ restored.UID = ""
+ restored.CreationTimestamp = metav1.Time{}
+ restored.ManagedFields = nil
+ if err := kclient.Create(context.Background(), restored); err != nil {
if !kerrors.IsAlreadyExists(err) {
m.t.Errorf("failed to restore vap %q: %v", m.name, err)
}
} else {
m.t.Logf("Restored VAP %q", m.name)
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Restore VAP if we saved it | |
| if m.savedVAP != nil { | |
| // Clear resource version to allow recreation | |
| m.savedVAP.ResourceVersion = "" | |
| if err := kclient.Create(context.Background(), m.savedVAP); err != nil { | |
| if !kerrors.IsAlreadyExists(err) { | |
| m.t.Errorf("failed to restore vap %q: %v", m.name, err) | |
| } | |
| } else { | |
| m.t.Logf("Restored VAP %q", m.name) | |
| } | |
| } | |
| // Scale CVO back up | |
| if err := scaleDeployment(m.t, cvoNamespace, cvoDeploymentName, m.initialCVOReplicas); err != nil { | |
| m.t.Errorf("failed to scale up cvo: %v", err) | |
| // Restore VAP if we saved it | |
| if m.savedVAP != nil { | |
| restored := m.savedVAP.DeepCopy() | |
| restored.ResourceVersion = "" | |
| restored.UID = "" | |
| restored.CreationTimestamp = metav1.Time{} | |
| restored.ManagedFields = nil | |
| if err := kclient.Create(context.Background(), restored); err != nil { | |
| if !kerrors.IsAlreadyExists(err) { | |
| m.t.Errorf("failed to restore vap %q: %v", m.name, err) | |
| } | |
| } else { | |
| m.t.Logf("Restored VAP %q", m.name) | |
| } | |
| } | |
| // Scale CVO back up | |
| if err := scaleDeployment(m.t, cvoNamespace, cvoDeploymentName, m.initialCVOReplicas); err != nil { | |
| m.t.Errorf("failed to scale up cvo: %v", err) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/util_gatewayapi_test.go` around lines 1271 - 1286, The restored VAP
object (m.savedVAP) must be deep-copied and have all server-managed metadata
cleared before calling kclient.Create to avoid API rejections; replace the
current single-field mutation of ResourceVersion with a DeepCopy() of m.savedVAP
and set metadata fields such as UID, ResourceVersion, CreationTimestamp,
Generation, SelfLink and ManagedFields to empty/zero values (preserving any
required labels/annotations you intend to keep) then pass that sanitized copy to
kclient.Create; ensure you still log on success/failure as currently done.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/e2e/util_gatewayapi_test.go (1)
1271-1276:⚠️ Potential issue | 🟠 MajorSanitize full server-managed metadata before VAP
Create(Line 1274).Only clearing
ResourceVersionis not enough when recreating an object fetched from the API. Server-managed metadata (UID,CreationTimestamp,ManagedFields, etc.) can causeCreaterejection. This is especially risky wheninitialCVOReplicas == 0, because CVO won’t self-heal the VAP afterward.🔧 Proposed fix
func (m *vapManager) enable() { // Restore VAP if we saved it if m.savedVAP != nil { - // Clear resource version to allow recreation - m.savedVAP.ResourceVersion = "" - if err := kclient.Create(context.Background(), m.savedVAP); err != nil { + restored := m.savedVAP.DeepCopy() + restored.UID = "" + restored.ResourceVersion = "" + restored.CreationTimestamp = metav1.Time{} + restored.ManagedFields = nil + restored.Generation = 0 + restored.SelfLink = "" + if err := kclient.Create(context.Background(), restored); err != nil { if !kerrors.IsAlreadyExists(err) { m.t.Errorf("failed to restore vap %q: %v", m.name, err) } } else { m.t.Logf("Restored VAP %q", m.name) } }For Kubernetes admissionregistration.k8s.io/v1 ValidatingAdmissionPolicy Create requests, which ObjectMeta fields are server-managed and should be cleared/omitted when recreating an object previously retrieved via GET?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util_gatewayapi_test.go` around lines 1271 - 1276, m.savedVAP recreation fails because only ResourceVersion is cleared; you must strip all server-managed ObjectMeta before calling kclient.Create (m.savedVAP) to avoid admission/create rejections. Before the Create call, clear UID, ResourceVersion, SelfLink, Generation, CreationTimestamp, and ManagedFields (and any other server-populated metadata) on m.savedVAP.ObjectMeta or reconstruct ObjectMeta preserving only client-managed bits like Name/Namespace/Labels/Annotations as needed, then call kclient.Create(context.Background(), m.savedVAP).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/util_gatewayapi_test.go`:
- Around line 1271-1276: m.savedVAP recreation fails because only
ResourceVersion is cleared; you must strip all server-managed ObjectMeta before
calling kclient.Create (m.savedVAP) to avoid admission/create rejections. Before
the Create call, clear UID, ResourceVersion, SelfLink, Generation,
CreationTimestamp, and ManagedFields (and any other server-populated metadata)
on m.savedVAP.ObjectMeta or reconstruct ObjectMeta preserving only
client-managed bits like Name/Namespace/Labels/Annotations as needed, then call
kclient.Create(context.Background(), m.savedVAP).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
test/e2e/util_gatewayapi_test.go
e35f1e3 to
2b7a564
Compare
| return 0, fmt.Errorf("failed to get cvo deployment: %w", err) | ||
| } | ||
|
|
||
| if cvo.Spec.Replicas != nil { |
There was a problem hiding this comment.
| // Clear resource version to allow recreation | ||
| m.savedVAP.ResourceVersion = "" | ||
| if err := kclient.Create(context.Background(), m.savedVAP); err != nil { | ||
| if !kerrors.IsAlreadyExists(err) { |
There was a problem hiding this comment.
if it exists, but you have for some reason updated in place, is the expectation not to restore to the original VAP?
Eg.: you try to create, something exists but it is not the saved VAP. Is this fine?
There was a problem hiding this comment.
Good question. In theory, it should never be re-created after we deleted in the happy path, but there possibilities it might get re-create:
- Someone or something scaled back up the CVO
- Someone or something manually created the VAP
If # 1 happens, that's fine, we don't care if it fails to be created. If # 2 happens, well, that might be problematic, but that's also an existing issue with generally any test (it's problematic to re-create/modify/delete objects under test). So I don't feel strongly either way.
However, I am updating it to only recreate when m.initialCVOReplicas == 0, since we really don't need to re-create if we are re-enabling the CVO.
| name := types.NamespacedName{Name: m.name} | ||
| if err := kclient.Get(context.Background(), name, vap); err != nil { | ||
| return fmt.Errorf("failed to get vap %q: %w", m.name, err), func() { | ||
| if err := scaleDeployment(m.t, cvoNamespace, cvoDeploymentName, m.initialCVOReplicas); err != nil { |
There was a problem hiding this comment.
NIT! :
given the returned function here is used also on the deleteExistingVap, would it make sense to do a common definition of it once after the definition of m.initialCVOReplicas and simply call it?
2b7a564 to
31b7859
Compare
| return 0, fmt.Errorf("failed to get cvo deployment: %w", err) | ||
| } | ||
|
|
||
| if cvo.Spec.Replicas != nil { |
| // Clear resource version to allow recreation | ||
| m.savedVAP.ResourceVersion = "" | ||
| if err := kclient.Create(context.Background(), m.savedVAP); err != nil { | ||
| if !kerrors.IsAlreadyExists(err) { |
There was a problem hiding this comment.
Good question. In theory, it should never be re-created after we deleted in the happy path, but there possibilities it might get re-create:
- Someone or something scaled back up the CVO
- Someone or something manually created the VAP
If # 1 happens, that's fine, we don't care if it fails to be created. If # 2 happens, well, that might be problematic, but that's also an existing issue with generally any test (it's problematic to re-create/modify/delete objects under test). So I don't feel strongly either way.
However, I am updating it to only recreate when m.initialCVOReplicas == 0, since we really don't need to re-create if we are re-enabling the CVO.
| name := types.NamespacedName{Name: m.name} | ||
| if err := kclient.Get(context.Background(), name, vap); err != nil { | ||
| return fmt.Errorf("failed to get vap %q: %w", m.name, err), func() { | ||
| if err := scaleDeployment(m.t, cvoNamespace, cvoDeploymentName, m.initialCVOReplicas); err != nil { |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rikatz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Finalizing some discussion in slack on this |
31b7859 to
8069e1e
Compare
|
New changes are detected. LGTM label has been removed. |
Store and restore CVO's initial replica count instead of hardcoding 1. Fixes debugging workflows where CVO is intentionally disabled.
8069e1e to
1de670d
Compare
|
/retest-required |
|
@gcs278: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/assign @alebedev87 |
alebedev87
left a comment
There was a problem hiding this comment.
LGTM just one suggestion.
| vap := &admissionregistrationv1.ValidatingAdmissionPolicy{} | ||
| name := types.NamespacedName{Name: m.name} | ||
| if err := kclient.Get(context.Background(), name, vap); err != nil { | ||
| return fmt.Errorf("failed to get vap %q: %w", m.name, err), recoverCVO | ||
| } |
There was a problem hiding this comment.
CVO's replicas is retrieved with a poll. For consistency and test resiliency we can do a polled Get here too.
Store and restore CVO's initial replica count instead of hardcoding 1. Fixes debugging workflows where CVO is intentionally disabled.
During debugging, I disable CVO so I can run custom builds of the ingress operator. If CVO gets re-enabled, it reverts my custom build and breaks my workflow.