Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll by novosibman · Pull Request #13768 · apache/kafka

novosibman · 2023-05-26T19:50:55Z

Related issue https://issues.apache.org/jira/browse/KAFKA-9693

The issue with repeating latency spikes during Kafka log segments rolling still reproduced on the latest versions including kafka_2.13-3.4.0.

It was found that flushing Kafka snapshot file during segments rolling blocks producer request handling thread for some time:
https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/ProducerStateManager.scala#L452

  private def writeSnapshot(file: File, entries: mutable.Map[Long, ProducerStateEntry]): Unit = {
...
    val fileChannel = FileChannel.open(file.toPath, StandardOpenOption.CREATE, StandardOpenOption.WRITE)
    try {
      fileChannel.write(buffer)
      fileChannel.force(true)     <- here
    } finally {
      fileChannel.close()
    }...

More partitions - more cumulative latency effect observed.

Suggested fix offloads flush (fileChannel.force) operation to the background thread similar to (but not exactly) how it was done in the UnifiedLog.scala:

  def roll(
   ...
    // Schedule an asynchronous flush of the old segment
    scheduler.schedule("flush-log", () => flushUptoOffsetExclusive(newSegment.baseOffset))
  }

The benchmarking using this fix shows significant reduction in repeating latency spikes:

test config:
AWS
3 node cluster (i3en.2xlarge)
zulu11.62.17-ca-jdk11.0.18-linux_x64, heap 6G per broker
1 loadgen (m5n.8xlarge) - OpenMessaging benchmark (OMB)
1 zookeeper (t2.small)
acks=all batchSize=1048510 consumers=4 insyncReplicas=2 lingerMs=1 mlen=1024 producers=4 rf=3 subscriptions=1 targetRate=200k time=12m topics=1 warmup=1m

variation 1:

partitions=10

metric	kafka_2.13-3.4.0	kafka_2.13-3.4.0 patched
endToEnd service_time (ms) p50 max	2.00	2.00
endToEnd service_time (ms) p75 max	3.00	2.00
endToEnd service_time (ms) p95 max	94.0	3.00
endToEnd service_time (ms) p99 max	290	6.00
endToEnd service_time (ms) p99.9 max	355	21.0
endToEnd service_time (ms) p99.99 max	372	34.0
endToEnd service_time (ms) p100 max	374	36.0
publish service_time (ms) p50 max	1.70	1.67
publish service_time (ms) p75 max	2.23	2.09
publish service_time (ms) p95 max	90.7	2.82
publish service_time (ms) p99 max	287	4.69
publish service_time (ms) p99.9 max	353	19.6
publish service_time (ms) p99.99 max	369	31.3
publish service_time (ms) p100 max	371	33.5

kafka	endToEnd chart
kafka_2.13-3.4.0
kafka_2.13-3.4.0 patched

latency score improved up to 10x times in high percentiles ^^^, spikes almost invisible

variation 2:

partitions=100

metric	kafka_2.13-3.4.0	kafka_2.13-3.4.0 patched
endToEnd service_time (ms) p50 max	91.0	2.00
endToEnd service_time (ms) p75 max	358	3.00
endToEnd service_time (ms) p95 max	1814	4.00
endToEnd service_time (ms) p99 max	2777	21.0
endToEnd service_time (ms) p99.9 max	3643	119
endToEnd service_time (ms) p99.99 max	3724	141
endToEnd service_time (ms) p100 max	3726	143
publish service_time (ms) p50 max	77.4	1.92
publish service_time (ms) p75 max	352	2.35
publish service_time (ms) p95 max	1748	3.80
publish service_time (ms) p99 max	2740	18.9
publish service_time (ms) p99.9 max	3619	116
publish service_time (ms) p99.99 max	3720	139
publish service_time (ms) p100 max	3722	141
endToEnd service_time

kafka	endToEnd chart
kafka_2.13-3.4.0
kafka_2.13-3.4.0 patched

latency score improved up to 25x times in high percentiles ^^^

The fix was done for 3.4 branch - scala version of ProducerStateManager. Trunk needs corresponding fix for ProducerStateManager.java.

…-9693

ijuma · 2023-05-26T20:21:28Z

We typically make changes to master first. Would you be willing to submit a PR for that instead?

jolshan · 2023-05-27T00:26:18Z

Thanks for the PR! This looks promising. As Ismael said, let's share in trunk first.

zhangsm6 · 2023-05-30T16:50:16Z

@novosibman What is the file system used in your test? OMB default should be xfs but wondering if it was changed

novosibman · 2023-05-30T17:45:35Z

@novosibman What is the file system used in your test? OMB default should be xfs but wondering if it was changed

Yes, used xfs. This fs type was found in OMB settings.
Older experiments showed about 10x worse in high percentiles when using ext4 :

x-axis - timeline in minutes
// the test is quite short not including Kafka log segments rolling

Reference test configuration used:
https://developer.confluent.io/learn/kafka-performance/

novosibman · 2023-05-30T18:47:21Z

We typically make changes to master first. Would you be willing to submit a PR for that instead?

Prepared and tested trunk version: #13782

github-actions · 2023-08-29T03:33:08Z

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

divijvaidya · 2024-02-02T14:07:22Z

This PR is fixed in trunk (scheduled for release in 3.7.0). Currently there are no plans of backporting this to earlier versions since this is a performance optimization and not a critical bug fix. I am closing this PR, please feel free to reopen if you think we still need to backport this.

Suggested fix for Kafka latency spikes during segment rolling - KAFKA…

ff2b65e

…-9693

divijvaidya added the performance label May 30, 2023

novosibman mentioned this pull request May 30, 2023

KAFKA-9693: Kafka latency spikes caused by log segment flush on roll #13782

Closed

github-actions bot added the stale Stale PRs label Aug 29, 2023

divijvaidya closed this Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll#13768

Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll#13768
novosibman wants to merge 1 commit intoapache:3.4from
novosibman:3.4

novosibman commented May 26, 2023

Uh oh!

ijuma commented May 26, 2023

Uh oh!

jolshan commented May 27, 2023

Uh oh!

zhangsm6 commented May 30, 2023

Uh oh!

novosibman commented May 30, 2023 •

edited

Loading

Uh oh!

novosibman commented May 30, 2023

Uh oh!

github-actions bot commented Aug 29, 2023

Uh oh!

divijvaidya commented Feb 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

novosibman commented May 26, 2023

variation 1:

variation 2:

Uh oh!

ijuma commented May 26, 2023

Uh oh!

jolshan commented May 27, 2023

Uh oh!

zhangsm6 commented May 30, 2023

Uh oh!

novosibman commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

novosibman commented May 30, 2023

Uh oh!

github-actions bot commented Aug 29, 2023

Uh oh!

divijvaidya commented Feb 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

novosibman commented May 30, 2023 •

edited

Loading