Flush export queue when it reaches max_export_batch_size#1521
Flush export queue when it reaches max_export_batch_size#1521codeboten merged 2 commits intoopen-telemetry:mainfrom
max_export_batch_size#1521Conversation
toumorokoshi
left a comment
There was a problem hiding this comment.
Thanks! Code looks good, blocking design question.
| self.queue.appendleft(span) | ||
|
|
||
| if len(self.queue) >= self.max_queue_size // 2: | ||
| if len(self.queue) >= self.max_export_batch_size: |
There was a problem hiding this comment.
Thanks for the change! I would wonder if it would be better to modify the exporter to flush the queue at something less than the max batch size?
I think this code was written to notify a flush well before the queue filled to handle race conditions. I haven't looked at the code super closely recently, but IIRC there is a possibility for multiple spans to end at the same time, and as a result:
- first span would notify the flush
- second span would be dropped, since the flush is notified, but the export_batch_size has been reached.
So maybe take something a little more cautious, like filling up to 80%, and have be the condition for both the notify and the flush itself (refactoring both to a single function call would prevent future splitting of the code).
There was a problem hiding this comment.
Queue overflow and spans drop — that's exactly the issue I'm addressing here.
There are 2 different numbers — max_queue_size = 2048 and max_export_batch_size = 512.
Don't think it's a good idea to flush half- (or 80%-) full requests. That will only make effective_export_batch_size = 0.8 * max_export_batch_size.
Second span in your example won't be dropped if max_queue_size > max_export_batch_size, which should be true.
Maybe as extra check the code should validate that max_queue_size > 2 * max_export_batch_size or so.
There was a problem hiding this comment.
Got it! Makes sense. Sorry about that, I misread the code the first time.
I think the checks there are there are already good (albeit maybe require more documentation).
toumorokoshi
left a comment
There was a problem hiding this comment.
LGTM! Good catch, thanks for the change.
Description
Currently
BatchExportSpanProcessorskips the sleep if it already hasmax_export_batch_sizein queue, 512 by default. But it interrupts that sleep only when it collects 1024 items in the queue.It should export immediately when there are
max_export_batch_sizeitems in the queue.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Manually by creating spans and analysing logs
Does This PR Require a Contrib Repo Change?
Answer the following question based on these examples of changes that would require a Contrib Repo Change:
The OTel specification has changed which prompted this PR to update the method interfaces of
opentelemetry-api/oropentelemetry-sdk/The method interfaces of
opentelemetry-instrumentation/have changedThe method interfaces of
test/utilhave changedScripts in
scripts/that were copied over to the Contrib repo have changedConfiguration files that were copied over to the Contrib repo have changed (when consistency between repositories is applicable) such as in
pyproject.tomlisort.cfg.flake8When a new
.github/CODEOWNERis addedMajor changes to project information, such as in:
README.mdCONTRIBUTING.mdYes. - Link to PR:
No.
Checklist: