-
Notifications
You must be signed in to change notification settings - Fork 828
Update transient errors retry timeout and retryable status codes #1842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -262,14 +262,11 @@ def _translate_data( | |
| pass | ||
|
|
||
| def _export(self, data: TypingSequence[SDKDataT]) -> ExportResultT: | ||
|
|
||
| max_value = 64 | ||
| # expo returns a generator that yields delay values which grow | ||
| # exponentially. Once delay is greater than max_value, the yielded | ||
| # value will remain constant. | ||
| # max_value is set to 900 (900 seconds is 15 minutes) to use the same | ||
| # value as used in the Go implementation. | ||
|
|
||
| max_value = 900 | ||
|
|
||
| for delay in expo(max_value=max_value): | ||
|
|
||
| if delay == max_value: | ||
|
|
@@ -289,8 +286,6 @@ def _export(self, data: TypingSequence[SDKDataT]) -> ExportResultT: | |
| if error.code() in [ | ||
| StatusCode.CANCELLED, | ||
| StatusCode.DEADLINE_EXCEEDED, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aren't cancelled and deadline exceeded errors raised by the client itself? Not 100% sure about deadline exceeded but I think cancelled should not be retried as it means (if I understand correctly) that the client decided to cancel the request for example in case it received some short of shutdown/exit signal.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is typically client but I believe this is allowed to cover the case when server does it https://grpc.io/docs/what-is-grpc/core-concepts/#cancelling-an-rpc.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And I found there is a table which lists what is retryable https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlpgrpc-response.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. Not a blocker for me then but I wonder if is there a simple way for the exporter to cancel an in-flight request on shutdown (with or without timeout)?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I understand your concern. Anyway we have some work to do in making shutdown spec complaint we can address this there. |
||
| StatusCode.PERMISSION_DENIED, | ||
| StatusCode.UNAUTHENTICATED, | ||
| StatusCode.RESOURCE_EXHAUSTED, | ||
| StatusCode.ABORTED, | ||
| StatusCode.OUT_OF_RANGE, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make this configurable? For example as an
export_timeout: maximum duration for exporting spansargument to the exporter?My use case is CLI applications where anything having them hang on exit for 64 seconds because a telemetry backend is down is unacceptable. For comparison the Go implementation has two separate timeouts
ExportTimeoutfor the full operation andBatchTimeoutfor the direct publish.For now I have special subclass with a copy of this method to tweak the behavior but it's a maintenance issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will bring up the topic of supporting configurability in next SIG meeting. Spec is not clear(at this point) in that regard and anything we do might become breaking change in future. And speaking of
ExportTimeoutandBatchTimeoutpython also has them asschedule_delay_millisandexport_timeout_millishere I don't know why but we are only enforcing the export timeout only during force flush.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked about this in SIG call. This is out of scope for this. Consensus was this is not clear from Spec so we might end up making backwards incompatible change by introducing the support for configuring this. One thing we want to explore is making retry async and not blocking and also looking at how span processor timeout is behaving. Please create another issue and we will track it there.