Skip to content

Add concurrent batching to the file sink #20394

@fpytloun

Description

@fpytloun

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I can see 100% utilization on file sink which then applies backpressure and slows-down whole pipeline. I am using tmpfs so disk is not a bottleneck but high cardinal partitioning could be. It seems that file sink is not batching concurrently and therefore applying backpressure quickly (especially with gzip compression).

Configuration

[sinks.out_kafka_access_file]
      type = "file"
      #inputs = ["throttle_kafka_access_tenant"]
      inputs = ["remap_kafka_access"]
      compression = "gzip"
      encoding.except_fields = ["_index", "_topic", "_topic_template", "_partition", "_offset", "_throttle_key", "_hash", "_alert", "_keep", "_sd", "_source", "_syslog_severity", "_file_suffix", 'kubernetes.labels."pod-template-hash"', "@source_type", "@metadata"]
      encoding.codec = "json"
      framing.method = "newline_delimited"
      # ._file_suffix = to_int(to_int(now()) / 300)
      path = "/var/lib/vector/s3sync/out_kafka_access_file/topics/{{ _topic }}/year=%Y/month=%m/day=%d/hour=%H/${HOSTNAME}.pa2-par-gc-int-ves-io_{{ _file_suffix }}.json.gz"
      idle_timeout_secs = 30
      buffer.type = "memory"
      buffer.max_events = 3000    # default 500 with memory buffer

Version

0.37.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    domain: performanceAnything related to Vector's performancesink: fileAnything `file` sink relatedtype: enhancementA value-adding code change that enhances its existing functionality.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions