Skip to content

GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression#38067

Merged
kou merged 5 commits into
apache:mainfrom
mapleFU:compression/non-stream-benchmark
Oct 25, 2023
Merged

GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression#38067
kou merged 5 commits into
apache:mainfrom
mapleFU:compression/non-stream-benchmark

Conversation

@mapleFU

@mapleFU mapleFU commented Oct 6, 2023

Copy link
Copy Markdown
Member

Rationale for this change

Currently, we will enable compression benchmark with ARROW_WITH_BENCHMARKS_REFERENCE

Note that it only has benchmark for compressor ( make by Codec::MakeCompressor() ) and decompressor ( make by Codec::MakeDecompressor ). However, Parquet uses Codec to encode and decode. So, I'd like to add benchmarks that use Codec directly.

What changes are included in this PR?

Add benchmark for direct compression and decompression

Are these changes tested?

no need

Are there any user-facing changes?

no

@github-actions

github-actions Bot commented Oct 6, 2023

Copy link
Copy Markdown

⚠️ GitHub issue #38042 has been automatically assigned in GitHub to PR creator.

@mapleFU mapleFU requested review from kou and pitrou October 20, 2023 06:14
@mapleFU

mapleFU commented Oct 20, 2023

Copy link
Copy Markdown
Member Author

@pitrou @kou Since parquet uses Compression/Decompression in Codec, I've add group of test here. Would you mind take a look?

@kou kou changed the title GH-38042: [C++] Benchmark: Add benchmark for non-stream Codec Compression/Decompression GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression Oct 20, 2023

@kou kou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment thread cpp/src/arrow/util/compression_benchmark.cc Outdated
Comment thread cpp/src/arrow/util/compression_benchmark.cc Outdated
Comment thread cpp/src/arrow/util/compression_benchmark.cc Outdated
Comment thread cpp/src/arrow/util/compression_benchmark.cc Outdated
Comment on lines +254 to +256
BENCHMARK_TEMPLATE(ReferenceCompression, Compression::LZ4_FRAME);
BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::LZ4_FRAME);
BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::LZ4_FRAME);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is LZ4_FRAME OK?
It seems that Parquet doesn't use LZ4_FRAME.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can even benchmark both LZ4 variants.

@mapleFU mapleFU Oct 23, 2023

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that Parquet doesn't use LZ4_FRAME

Aha I remember parquet-mr first implement LZ4. And arrow implement a different version ( LZ4_FRAME ). LZ4 stores an extra-length here.

Maybe apache/parquet-format#168 helps

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I don't think they have too many differences...

Currently I didn't add LZ4. But feel free to add if neccesssary

@github-actions github-actions Bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Oct 20, 2023

@kou kou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 3be5e60 into apache:main Oct 25, 2023
@kou kou removed the awaiting merge Awaiting merge label Oct 25, 2023
@github-actions github-actions Bot added the awaiting merge Awaiting merge label Oct 25, 2023
@pitrou

pitrou commented Oct 25, 2023

Copy link
Copy Markdown
Member

So we could have added LZ4 and Snappy here. @mapleFU Would you like to do that as a followup PR?

@mapleFU

mapleFU commented Oct 25, 2023

Copy link
Copy Markdown
Member Author

Let me rush it :-)

(Just curiously, is it related to #38389 ) ?

@pitrou

pitrou commented Oct 25, 2023

Copy link
Copy Markdown
Member

It's just reasonable to benchmark all available codecs, not a subset of them.

@mapleFU

mapleFU commented Oct 25, 2023

Copy link
Copy Markdown
Member Author

@pitrou added #38453

@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 3be5e60.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ompression (apache#38067)

### Rationale for this change

Currently, we will enable compression benchmark with ARROW_WITH_BENCHMARKS_REFERENCE

Note that it only has benchmark for compressor ( make by Codec::MakeCompressor() ) and decompressor ( make by Codec::MakeDecompressor ). However, Parquet uses Codec to encode and decode. So, I'd like to add benchmarks that use Codec directly.

### What changes are included in this PR?

Add benchmark for direct compression and decompression

### Are these changes tested?

no need

### Are there any user-facing changes?

no

* Closes: apache#38042

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ompression (apache#38067)

### Rationale for this change

Currently, we will enable compression benchmark with ARROW_WITH_BENCHMARKS_REFERENCE

Note that it only has benchmark for compressor ( make by Codec::MakeCompressor() ) and decompressor ( make by Codec::MakeDecompressor ). However, Parquet uses Codec to encode and decode. So, I'd like to add benchmarks that use Codec directly.

### What changes are included in this PR?

Add benchmark for direct compression and decompression

### Are these changes tested?

no need

### Are there any user-facing changes?

no

* Closes: apache#38042

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Benchmark] Add non-stream Codec Compression/Decompression cases

3 participants