-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Component(s)
exporter/kafka
What happened?
Description
Currently, errors returned by p.client.ProducerSync in the franz-go library are all being treated as retryable, with no distinction between retryable and non-retryable error types. Since franz-go exposes a way to check whether an error is retryable, we should make use of this information to prevent unnecessary retries in the OTel pipeline.
A practical example: when producing a Kafka message larger than the configured producer.max_message_bytes, franz-go immediately returns a kerr.MessageTooLarge error, which is defined as a non-retryable error. However, the retry_sender continues retrying the same message until all retry attempts are exhausted, even though the attempts will never succeed.
This how the current implementation of FranzSyncProducer::ExportData looks like -
func (p *FranzSyncProducer) ExportData(ctx context.Context, msgs Messages) error {
.
.
.
result := p.client.ProduceSync(ctx, messages...)
var errs []error
for _, r := range result {
if r.Err != nil {
errs = append(
errs,
fmt.Errorf("error exporting to topic %q: %w", r.Record.Topic, r.Err),
)
}
}
return errors.Join(errs...)
}
I propose we change it to something like below -
func (p *FranzSyncProducer) ExportData(ctx context.Context, msgs Messages) error {
.
.
.
result := p.client.ProduceSync(ctx, messages...)
var errs []error
for _, r := range result {
if r.Err != nil {
err := fmt.Errorf("error exporting to topic %q: %w", r.Record.Topic, r.Err)
kgoErr := &kerr.Error{}
if isKgoErr := errors.As(r.Err, &kgoErr); isKgoErr && !kgoErr.Retriable {
err = consumererror.NewPermanent(err)
}
errs = append(errs, err)
}
}
return errors.Join(errs...)
}
Steps to Reproduce
Set producer.max_message_bytes in the otel config to 1024 and send a bigger sized message. Enable retry_on_failure for the exporter.
Expected Result
Such messages should not be retried since such a request would never succeed.
Actual Result
Messages are being retried till the retry attempts are exhausted.
Collector version
0.141.0
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.