[pkg/stanza] [receiver/windowseventlogreceiver]: speed up receiver#43195
[pkg/stanza] [receiver/windowseventlogreceiver]: speed up receiver#43195ChrsMark merged 6 commits intoopen-telemetry:mainfrom
Conversation
|
|
|
Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib. Important reminders:
A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better! |
a248f1f to
5609b94
Compare
pjanotti
left a comment
There was a problem hiding this comment.
Thanks for your contribution @MrAnno!
I would describe this change as removing the upper-bound per poll and collecting events until there are no more events to be consumed. This can be useful in various scenarios. That said I'm a bit worried that now there are no upper bounds to the receiver reading events. It seems a good defense to have some default upper bound - it can be time or max number of events, I'm not sure at this point. Such upper bound limit can avoid the cases where something is generating a flood of events (and probably already using lots of CPU) and the collector is not backing off at all, causing CPU oversubscription on the box.
Consider the case that the collector is also configured to collect all events, it should back off from time to time even if that is going to make it taking longer to read all events. That said this change will make it catchup much faster in such cases.
|
@pjanotti Thank you for the quick response. I've added a new config option called I think setting the default value to anything other than 0 would be pretty arbitrary, because everything we want to achieve here depends on the given system's resources (for a PC, 1000 is a good default, but on a real collector server, I probably wouldn't go under 20 000). If my speed-up patch is considered a breaking change due to its possible peak resource consumption, we could come up with a fair default. Otherwise, I would prefer 0, because the CPU usage of |
pjanotti
left a comment
There was a problem hiding this comment.
@MrAnno I'm fine with defaulting to no rate limit by default, but, I think that just counting the overall number of events read in a single call to readAll and letting readThrottle return false when that limit is reached achieves the effect that we want. Simple having the limit per poll is easier for users to understand and control.
020da97 to
599847c
Compare
My thought process was that users shouldn't even know about the poll interval under normal circumstances, because that is an implementation detail and the default value should "just work" in 99% of the use-cases. The intention with the I just wanted to share this, but I don't want to be a hindrance. Please let me know your preference, and I will change the PR accordingly. |
|
Thanks for being flexible here @MrAnno
Yes, that seems reasonable to me. I'm asking for the simpler throttling for the few cases that the user gets to the point of configuring. It will consume at most X events until next poll interval is very straight forward to understand. In this context just counting has the effect of backing off and "refilling the bucket" automatically occurs at the poll interval. |
|
@pjanotti Thanks. I've reimplemented the rate limit option, the new implementation has the following 2 side effects:
|
227a4d6 to
380b9ef
Compare
|
@MrAnno noticed that I merged main into this PR branch, so remember to pull before you make any changes. My understanding is that you are looking into adding a basic test for the new feature, it will also be good to have some config tests. |
9fe67a2 to
fb5f0ae
Compare
pjanotti
left a comment
There was a problem hiding this comment.
Remaining issues:
- https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/18648688527/job/53211077377?pr=43195#step:6:168
- https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/18648688527/job/53211077377?pr=43195#step:8:12
Could you please update the description of the PR so it matches the current (hopefully final) implementation?
404fbca to
60c6c9e
Compare
|
@pjanotti Can we give the CI another try, please? I missed a C-API boundary in tests. |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
Sorry, I forgot about this. I'll try to fix the test soon. |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
e88dca6 to
24e8d97
Compare
|
@pjanotti Sorry for the delay. I think the test is finally ready as well. |
|
@MrAnno are still having issues on CI - if you are too busy I can help with that so we get this merged soon. |
With the default `max_reads: 100`, `poll_interval: 1s` config fields, the receiver could process a maximum of 100 events per second. This commit introduces an interruptible readAll() method, which tries to read all the messages in max_reads-sized batches and only falls back to polling when it reaches the end of the eventlog channel. Note that the same performance cannot be achieved by simply adjusting max_reads, because RPC_S_INVALID_BOUND errors limit the maximum configurable batch size. Alternatively, setting poll_interval to a smaller value would also incur some overhead.
…ith max_events_per_poll
It was impossible to implement proper mocks for evt* functions by patching the LazyProc calls directly, as runtime/checkptr.go has too strict checks for that. Now, the evt* functions themselves are patched, which have the proper type parameters.
24e8d97 to
cc778f9
Compare
|
Sorry again. I'm finally on vacation, I'm trying to fix the last remaining issue now. |
|
Hm, it seems that the goleak error about the unexpected goroutine was not introduced by my PR, I can see the same error with a slightly different stack trace on @pjanotti Can you take a quick look, please? |
|
@MrAnno I'll take a look at it today |
Fix a failure identified by @MrAnno at #43195 (comment)
|
@MrAnno the fix for the unrelated failure was merged. I'm updating the branch so we can run the tests with it. If any other changes are needed (likely not) you will have to pull your branch before making the changes, if it passes I will approve and put the label "ready to merge" |
|
Thank you for your contribution @MrAnno! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help. |
Description
With the default
max_reads: 100,poll_interval: 1sconfig fields, the receiver could process a maximum of 100 events per second.This PR improves the performance of the Windows Event Log receiver by introducing an interruptible
readAll()method, which tries to read all the messages inmax_reads-sized batches and only falls back to polling when it reaches the end of the eventlog channel. The timer is started after each read cycle from now on.Note that the same performance cannot be achieved by simply adjusting
max_reads, becauseRPC_S_INVALID_BOUNDerrors limit the maximum configurable batch size. Alternatively, settingpoll_intervalto a smaller value would also incur some overhead.