Skip to content

Fix EventSeries starting count discrepancy#112334

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
dgrisonnet:fix-eventseries-count
Mar 14, 2023
Merged

Fix EventSeries starting count discrepancy#112334
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
dgrisonnet:fix-eventseries-count

Conversation

@dgrisonnet
Copy link
Copy Markdown
Member

@dgrisonnet dgrisonnet commented Sep 8, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

The kube-apiserver validation expects the Count of an EventSeries to be at least 2, otherwise, it rejects the Event. There was a discrepancy between the client and the server since the client was initializing an EventSeries to a count of 1.

According to the original KEP, the first event emitted should have an EventSeries set to nil and the second isomorphic event should have an EventSeries with a count of 2. Thus, we should match the behavior defined by the KEP and update the client.

Also, as an effort to make the old clients compatible with the servers, we should allow Events with an EventSeries count of 1 to prevent any unexpected rejections.

Special notes for your reviewer:

I stumbled upon this after finding some weird Server rejected event logs in the kube-scheduler. The error in the logs was in two parts:

'Event "apiserver-7bc6866f9c-2vk6r.17123d9ef072db8c" is invalid: [series.count: Invalid value: "": should be at least 2, eventTime: Invalid value: 2022-09-06 10:09:25.301316 +0000 UTC: field is immutable]' (will not retry!)

One is complaining about the series.count value being lower than 2 and the other about the eventTime value. The latter was recently fixed as part of #111928, but the count issue is still present.

Since series.count returning an empty string was a bit weird I looked into the audit logs to see what was the query sent to the apiserver. I found the following patch request body:

  "requestObject": {
     "series": {
       "count": 1,
       "lastObservedTime": "2022-09-08T19:35:25.972325Z"
     }
   }

Looking at the code, it seems that the empty string was hard coded... https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/events.go#L123

So the actual issue was a discrepancy between the client emitting the event and the server validating it.

Looking at the KEP, it seems that the server is correct: https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/383-new-event-api-ga-graduation#short-examples

Does this PR introduce a user-facing change?

Update the Event series starting count when emitting isomorphic events from 1 to 2.

/cc @liggitt

@k8s-ci-robot k8s-ci-robot requested a review from liggitt September 8, 2022 22:17
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 8, 2022
@k8s-triage-robot
Copy link
Copy Markdown

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 9, 2022
@aojea
Copy link
Copy Markdown
Member

aojea commented Sep 9, 2022

/lgtm
/test pull-kubernetes-e2e-gce-ubuntu-containerd
unrelated failure

/hold

for Jordan review

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 9, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 9, 2022
@liggitt
Copy link
Copy Markdown
Member

liggitt commented Sep 9, 2022

this needs a test (probably an integration test) that exercises this reporter in a way that would have caught the failure

e.g. walk an event through creation, introduction of a series, and incrementing of the series, and verify the resulting event API object gets updated and looks correct at each stage

@aojea
Copy link
Copy Markdown
Member

aojea commented Sep 10, 2022

/lgtm cancel
Damien ping me in slack if you need help with the test

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 10, 2022
@leilajal
Copy link
Copy Markdown
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 13, 2022
@dgrisonnet dgrisonnet force-pushed the fix-eventseries-count branch from 105dff2 to a12bd72 Compare November 30, 2022 17:26
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 30, 2022
Copy link
Copy Markdown
Member Author

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an integration test that is failing because it is hitting the data race reported in #92500, I'll send another PR to fix it.

Comment thread test/integration/events/events_test.go Outdated
@k8s-triage-robot
Copy link
Copy Markdown

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2023
@dgrisonnet
Copy link
Copy Markdown
Member Author

/test pull-kubernetes-integration
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2023
The kube-apiserver validation expects the Count of an EventSeries to be
at least 2, otherwise it rejects the Event. There was is discrepancy
between the client and the server since the client was iniatizing an
EventSeries to a count of 1.

According to the original KEP, the first event emitted should have an
EventSeries set to nil and the second isomorphic event should have an
EventSeries with a count of 2. Thus, we should matcht the behavior
define by the KEP and update the client.

Also, as an effort to make the old clients compatible with the servers,
we should allow Events with an EventSeries count of 1 to prevent any
unexpected rejections.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
@dgrisonnet dgrisonnet force-pushed the fix-eventseries-count branch from a12bd72 to d003649 Compare March 13, 2023 12:33
@dgrisonnet
Copy link
Copy Markdown
Member Author

@liggitt can you please have another look at this PR? I added the integration test as you requested.

/cc @wojtek-t

@k8s-ci-robot k8s-ci-robot requested a review from wojtek-t March 13, 2023 13:27
@liggitt
Copy link
Copy Markdown
Member

liggitt commented Mar 13, 2023

I'll defer review of the test to @aojea, the client update lgtm

@wojtek-t
Copy link
Copy Markdown
Member

/lgtm
/approve

Thanks!

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 17053a94ac2111c582248d2a47337f09d81ee790

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2023
@k8s-ci-robot k8s-ci-robot merged commit 689fc37 into kubernetes:master Mar 14, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Mar 14, 2023
@alculquicondor
Copy link
Copy Markdown
Member

Can we backport this? Do we still have a chance for 1.24?

@alculquicondor
Copy link
Copy Markdown
Member

k8s-ci-robot added a commit that referenced this pull request Aug 2, 2023
…114237-#114236-#112334-upstream-release-1.25

Automated cherry pick of #114237: tools/events: retry on AlreadyExist for Series
#114236: tools/events: fix data race when emitting series
#112334: events: fix EventSeries starting count discrepancy
k8s-ci-robot added a commit that referenced this pull request Aug 2, 2023
…114237-#114236-#112334-upstream-release-1.26

Automated cherry pick of #114237: tools/events: retry on AlreadyExist for Series
#114236: tools/events: fix data race when emitting series
#112334: events: fix EventSeries starting count discrepancy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants