Skip to content

[kafka/exporter] All franz go client errors are being treated as retryable #44918

@anubhav21sharma

Description

@anubhav21sharma

Component(s)

exporter/kafka

What happened?

Description

Currently, errors returned by p.client.ProducerSync in the franz-go library are all being treated as retryable, with no distinction between retryable and non-retryable error types. Since franz-go exposes a way to check whether an error is retryable, we should make use of this information to prevent unnecessary retries in the OTel pipeline.

A practical example: when producing a Kafka message larger than the configured producer.max_message_bytes, franz-go immediately returns a kerr.MessageTooLarge error, which is defined as a non-retryable error. However, the retry_sender continues retrying the same message until all retry attempts are exhausted, even though the attempts will never succeed.

This how the current implementation of FranzSyncProducer::ExportData looks like -

func (p *FranzSyncProducer) ExportData(ctx context.Context, msgs Messages) error {
	.
	.
	.
	result := p.client.ProduceSync(ctx, messages...)
	var errs []error
	for _, r := range result {
		if r.Err != nil {
			errs = append(
				errs,
				fmt.Errorf("error exporting to topic %q: %w", r.Record.Topic, r.Err),
			)
		}
	}
	return errors.Join(errs...)
}

I propose we change it to something like below -

func (p *FranzSyncProducer) ExportData(ctx context.Context, msgs Messages) error {
	.
	.
	.
	result := p.client.ProduceSync(ctx, messages...)
	var errs []error
	for _, r := range result {
		if r.Err != nil {
			err := fmt.Errorf("error exporting to topic %q: %w", r.Record.Topic, r.Err)
			kgoErr := &kerr.Error{}
			if isKgoErr := errors.As(r.Err, &kgoErr); isKgoErr && !kgoErr.Retriable {
				err = consumererror.NewPermanent(err)
			}
			errs = append(errs, err)
		}
	}
	return errors.Join(errs...)
}

Steps to Reproduce

Set producer.max_message_bytes in the otel config to 1024 and send a bigger sized message. Enable retry_on_failure for the exporter.

Expected Result

Such messages should not be retried since such a request would never succeed.

Actual Result

Messages are being retried till the retry attempts are exhausted.

Collector version

0.141.0

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions