KAFKA-7632: Support Compression Levels (KIP-390) by mimaison · Pull Request #15516 · apache/kafka

mimaison · 2024-03-11T18:40:17Z

Based on #10826 with updates to match the recent amends we made to KIP-390.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

mimaison · 2024-04-10T17:43:31Z

@divijvaidya It seems you've done a bit of work around compression in the past. Can you take a look? Thanks

showuon

Thanks for the PR. Left some comments.

showuon · 2024-04-25T12:31:44Z


        // No in place assignment situation 1
-        boolean inPlaceAssignment = sourceCompression == targetCompression;
+        boolean inPlaceAssignment = sourceCompressionType == targetCompression.type();


So we won't do re-compression if only level is different? I didn't see this in KIP. Maybe we should add it?

The broker has no easy way of retrieving the level that the producer used when compressing the records. So if the compression codec matches, I decided to keep the compressed bytes instead of decompressing and compressing everything again as this would be wasteful, especially as the producer could have already used the same compression level.

I agree. But I think we should mention this in the KIP-390 at least.

Right, I updated the KIP.

showuon · 2024-04-25T12:33:33Z

+    public static final String COMPRESSION_GZIP_LEVEL_DOC = "The compression level to use if " + COMPRESSION_TYPE_CONFIG + " is set to <code>gzip</code>.";
+    public static final String COMPRESSION_LZ4_LEVEL_CONFIG = "compression.lz4.level";
+    public static final String COMPRESSION_LZ4_LEVEL_DOC = "The compression level to use if " + COMPRESSION_TYPE_CONFIG + " is set to <code>lz4</code>.";
+    public static final String COMPRESSION_ZSTD_LEVEL_CONFIG = "compression.zstd.level";
+    public static final String COMPRESSION_ZSTD_LEVEL_DOC = "The compression level to use if " + COMPRESSION_TYPE_CONFIG + " is set to <code>zstd</code>.";


nit: Should we provide the doc link for each compression type? It's hard to know which level means what.

Do you mean a link to the compression library websites?

I was thinking we added in the config description. Or maybe added in KIP-390 is good enough.

junrao

@mimaison : Thanks for the PR. Made a pass of non-testing files. Left a few comments.

junrao · 2024-05-01T23:45:06Z

+
+    public static final int MIN_LEVEL = 1;
+    public static final int MAX_LEVEL = 17;
+    public static final int DEFAULT_LEVEL = 9;


So, every time we update the Lz4 library, we may need to update the above values? We probably want to add a note here.

I hesitated defining these constants for this reason but these levels have not changed over 10 years [0], so hopefully this won't require a lot of maintenance.

0: https://github.com/lz4/lz4-java/blame/master/src/java/net/jpountz/lz4/LZ4Constants.java#L23-L24

junrao

@mimaison : Thanks for the updated PR. Made a pass of all files. Added a few more comments.

junrao

@mimaison : Thanks for the updated PR. Just one more comment.

mimaison · 2024-05-16T13:18:18Z

I also added a couple of new tests in LogValidatorTest to check recompression only happens if the compression codec is different between the records from the producer and the topic configuration and does not happen if only the compression levels are different.

junrao

@mimaison: Thanks for the updated PR. Left one more comment.

junrao

@mimaison : Thanks for the updated PR. LGTM assuming all the failed tests have been triaged.

showuon

Had another look. LGTM! Just some comments to update the KIP. Thanks.

Co-authored-by: Lee Dongjin <dongjin@apache.org>

mimaison · 2024-05-21T10:55:48Z

Thanks for the reviews!
I had to rebase again so I'll wait for the CI to complete.

mimaison · 2024-05-21T15:50:48Z

None of the test failures seem related, merging to trunk

Reviewers: Jun Rao <jun@confluent.io>, Luke Chen <showuon@gmail.com> Co-authored-by: Lee Dongjin <dongjin@apache.org>

stanislavkozlovski · 2024-07-25T12:12:04Z

Has anybody noticed that the Linear Write test in KIP-390 is inaccurate?

It suggests that the write speed on a broker is 22GB/s. I wasn't able to find an SSD on the market in 2024 (6 years later) that supports this throughput
The reason for this, I think, is that it inaccurately sets the benchmark test to write just 8192 bytes --bytes 8192 and the size to write is 8192 itself, so the test seems to just be comparing the nanoseconds it takes for the very first write and then extrapolating further to a full second.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level

mimaison · 2024-07-26T15:00:05Z

Yeah it looks like the numbers are not accurate. To be honest it's a bit of a strange performance test. The Produce Test benchmark should be much more representative. However I did not reproduce that benchmark. It might be worth asking @dongjinleekr if he has any data points to share.

chia7712 · 2025-08-31T16:10:04Z

                                                                 ByteBufferOutputStream bufferOutputStream,
                                                                 final long deleteHorizonMs) {
        byte magic = originalBatch.magic();
+        Compression compression = Compression.of(originalBatch.compressionType()).build();


I'm not sure whether there has been any discussion about the compression level used during compaction. The current implementation just applies the default level, but perhaps it should respect the topic's configured compression level

Another brainstorm is to introduce a flag that allows compaction to use different compression. This would give users option to choose a different compression algorithm for older data.

@mimaison @junrao @showuon WDYT?

Right, here we could use the level if specified. I expect most topics to use compression.type=producer but in case a specific compression type and level is set, that would make sense to use them.

It's not something I thought about before. Do you think there are scenarios where the gains of picking a different level for older data would be significant enough to motivate such a feature?

That’s an interesting topic.
For 1., it would be intuitive and reasonable to use the compression level that is set. I’ll create a minor for it.

For 2., I’m not entirely sure, but one possible case is that users may prefer tighter compression for cold data to save space, especially if storage is cost-sensitive

Right, here we could use the level if specified. I expect most topics to use compression.type=producer but in case a specific compression type and level is set, that would make sense to use them.

@Yunyung Could you please file a minor patch for it?

Do you think there are scenarios where the gains of picking a different level for older data would be significant enough to motivate such a feature?

The key point is the compression type rather than level. I received a request to compress old data during compaction. The change should be straightforward, so it seems acceptable.

@chia7712 : Previously, we considered using the topic level compression type instead of the one in the original batch. There is one subtle issue on batch size. During compaction, we group a set of segments so that the total size doesn't exceed 2GB. If we use a different compression type, the compacted data could be exceeding the max segment limit and failing the index append.

If we use a different compression type, the compacted data could be
exceeding the max segment limit and failing the index append.

It appears that using a high compression level during ingestion can indeed trigger overflow issue during compaction. This is because the cleaner's size estimation becomes inaccurate when it rebuilds batches using the default compression level instead of the origin one

[2025-12-28 08:25:42,915] WARN [kafka-log-cleaner-thread-0]: Unexpected exception thrown when cleaning log Log(dir=/tmp/log-folder-0/chia-0, topicId=5vKt56jQQ_S3QRuqrNmrTw, topic=chia, partition=0, highWatermark=66028296, lastStableOffset=66028296, logStartOffset=0, logEndOffset=66028296). Marking its partition (chia-0) as uncleanable (org.apache.kafka.storage.internals.log.LogCleaner$CleanerThread) org.apache.kafka.storage.internals.log.LogCleaningException: Append of size 258080 bytes is too large for segment with current file position at 2147262463 at org.apache.kafka.storage.internals.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.java:570) at org.apache.kafka.storage.internals.log.LogCleaner$CleanerThread.tryCleanFilthiestLog(LogCleaner.java:544) at org.apache.kafka.storage.internals.log.LogCleaner$CleanerThread.doWork(LogCleaner.java:513) at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:136) Caused by: java.lang.IllegalArgumentException: Append of size 258080 bytes is too large for segment with current file position at 2147262463 at org.apache.kafka.common.record.FileRecords.append(FileRecords.java:196) at org.apache.kafka.storage.internals.log.LogSegment.append(LogSegment.java:260) at org.apache.kafka.storage.internals.log.Cleaner.cleanInto(Cleaner.java:405) at org.apache.kafka.storage.internals.log.Cleaner.cleanSegments(Cleaner.java:243) at org.apache.kafka.storage.internals.log.Cleaner.doClean(Cleaner.java:180) at org.apache.kafka.storage.internals.log.Cleaner.clean(Cleaner.java:127) at org.apache.kafka.storage.internals.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.java:596) at org.apache.kafka.storage.internals.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.java:565) ... 3 more

To mitigate this, I propose introducing a safety margin specifically for partitions that fail to compact due to overflows.

@mimaison @junrao @Yunyung WDYT?

Another option is to make the cleaning logic more generic so that it could produce more than 1 segment during each round of cleaning if the segment size limit is exceeded.

Theoretically, having the cleaner produce multiple segments is the superior design. However, the current implementation strictly pre-groups the segments. Simply splitting an overflowing group would result in fragmented, undersized segments. To handle this gracefully would require major architectural surgery on the grouping logic.

Given that this overflow is a rare edge case, requiring a specific combination of large segments, significant compression drop, and low deletion rates, the adaptive safety margin is a much more pragmatic win.

@junrao wdyt ?

open https://issues.apache.org/jira/browse/KAFKA-20036

Good find!
Thanks for following up

mimaison force-pushed the kip-390 branch 3 times, most recently from df9ca6e to 2f54aac Compare March 25, 2024 08:47

mimaison force-pushed the kip-390 branch 3 times, most recently from d284cd3 to 71d84bd Compare April 10, 2024 09:04

mimaison marked this pull request as ready for review April 10, 2024 10:00

mimaison force-pushed the kip-390 branch from 71d84bd to 6619e40 Compare April 10, 2024 15:09

mimaison mentioned this pull request Apr 10, 2024

KAFKA-7632: Support Compression Level #10826

Closed

3 tasks

showuon self-assigned this Apr 11, 2024

mimaison force-pushed the kip-390 branch from 6619e40 to dd728c8 Compare April 11, 2024 08:55

showuon reviewed Apr 25, 2024

View reviewed changes

junrao reviewed May 2, 2024

View reviewed changes

mimaison force-pushed the kip-390 branch from dd728c8 to 1282e3b Compare May 14, 2024 15:25

junrao reviewed May 14, 2024

View reviewed changes

junrao reviewed May 15, 2024

View reviewed changes

Comment thread server-common/src/test/java/org/apache/kafka/server/record/BrokerCompressionTypeTest.java Outdated

mimaison force-pushed the kip-390 branch from bc27574 to ea6b179 Compare May 16, 2024 13:12

junrao reviewed May 17, 2024

View reviewed changes

Comment thread core/src/test/scala/unit/kafka/log/LogValidatorTest.scala Outdated

mimaison force-pushed the kip-390 branch from e83e893 to 2158b1a Compare May 17, 2024 18:42

junrao approved these changes May 20, 2024

View reviewed changes

showuon approved these changes May 21, 2024

View reviewed changes

mimaison and others added 6 commits May 21, 2024 12:47

KAFKA-7632: Support Compression Levels (KIP-390)

7d94377

Co-authored-by: Lee Dongjin <dongjin@apache.org>

Address feedback

658b11b

Address 2nd batch of reviews

6843d9f

Address 3rd batch of reviews

9f0be37

Add LogValidation tests

35bc027

Use MAGIC_VALUE_V2 in LogValidatorTest

9b4f539

mimaison force-pushed the kip-390 branch from 2158b1a to 9b4f539 Compare May 21, 2024 10:54

mimaison merged commit affe8da into apache:trunk May 21, 2024

mimaison deleted the kip-390 branch May 21, 2024 15:58

rreddy-22 pushed a commit to rreddy-22/kafka-rreddy that referenced this pull request May 24, 2024

KAFKA-7632: Support Compression Levels (KIP-390) (apache#15516)

599afdf

Reviewers: Jun Rao <jun@confluent.io>, Luke Chen <showuon@gmail.com> Co-authored-by: Lee Dongjin <dongjin@apache.org>

chiacyu pushed a commit to chiacyu/kafka that referenced this pull request Jun 1, 2024

KAFKA-7632: Support Compression Levels (KIP-390) (apache#15516)

11cb198

Reviewers: Jun Rao <jun@confluent.io>, Luke Chen <showuon@gmail.com> Co-authored-by: Lee Dongjin <dongjin@apache.org>

TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jun 8, 2024

KAFKA-7632: Support Compression Levels (KIP-390) (apache#15516)

7c12882

Reviewers: Jun Rao <jun@confluent.io>, Luke Chen <showuon@gmail.com> Co-authored-by: Lee Dongjin <dongjin@apache.org>

gongxuanzhang pushed a commit to gongxuanzhang/kafka that referenced this pull request Jun 12, 2024

KAFKA-7632: Support Compression Levels (KIP-390) (apache#15516)

64c6d77

Reviewers: Jun Rao <jun@confluent.io>, Luke Chen <showuon@gmail.com> Co-authored-by: Lee Dongjin <dongjin@apache.org>

dongjoon-hyun mentioned this pull request Jul 31, 2024

[SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 apache/spark#47540

Closed

JoergSiebahn mentioned this pull request Jul 31, 2024

Fix failure that occurs in distroless image with read only file system SDA-SE/sda-dropwizard-commons#3575

Merged

chia7712 reviewed Aug 31, 2025

View reviewed changes

Conversation

mimaison commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

mimaison commented Apr 10, 2024

Uh oh!

showuon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mimaison commented May 16, 2024

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

showuon left a comment

Choose a reason for hiding this comment

Uh oh!

mimaison commented May 21, 2024

Uh oh!

mimaison commented May 21, 2024

Uh oh!

stanislavkozlovski commented Jul 25, 2024

mimaison commented Mar 11, 2024 •

edited

Loading