Skip to content

support compression for IPC with revamped feature flags#2369

Merged
tustvold merged 38 commits into
apache:masterfrom
alamb:alamb/help_feature_flags
Aug 14, 2022
Merged

support compression for IPC with revamped feature flags#2369
tustvold merged 38 commits into
apache:masterfrom
alamb:alamb/help_feature_flags

Conversation

@alamb

@alamb alamb commented Aug 8, 2022

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #1855
Closes liukun4515#1
Closes #1709
Closes #70
Closes #2342

Rationale for this change

Not this is a new PR is based on @liukun4515 's work in #1855. Github is showing a massive diff in liukun4515#1 so I wanted to get this PR up so we could see what the actual change is proposed

The logic is unchanged, but I changed how the feature flagging works so that the code differences are isolated into the ipc_compression module

What changes are included in this PR?

  1. Add support for LZ4_STREAM and ZSTD under ipc_compression feature flag
  2. Tests for same

Are there any user-facing changes?

New feature flag, and support for compression

@alamb alamb changed the title Alamb/help feature flags support compression for IPC with revamped feature flags Aug 8, 2022

@liukun4515 liukun4515 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort and push this feature forward, and help me to refactor this model.

LGTM

@alamb

alamb commented Aug 11, 2022

Copy link
Copy Markdown
Contributor Author

Thanks for your effort and push this feature forward, and help me to refactor this model.

It was my pleasure -- thank you for figuring out the tests and the format that was required

@alamb alamb requested a review from tustvold August 11, 2022 10:19
@liukun4515

Copy link
Copy Markdown
Contributor

Thanks for your effort and push this feature forward, and help me to refactor this model.

It was my pleasure -- thank you for figuring out the tests and the format that was required

Also we need to enable the IT test in the arrow.
#1907
And I can try it.

I have not thought about it carefully, but how to enable the ipc_compression in the IT ci?

@liukun4515

Copy link
Copy Markdown
Contributor

This issue #2342 also can be closed.

Comment thread .github/workflows/arrow.yml Outdated

@tustvold tustvold left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just some minor nits

Comment thread arrow/src/ipc/compression/codec.rs Outdated
} else if decompressed_length == LENGTH_NO_COMPRESSED_DATA {
// no compression
let data = &input[(LENGTH_OF_PREFIX_DATA as usize)..];
Buffer::from(data)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation, but this copy is kind of unfortunate (although existed before)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking the alternate is to create a Buffer initially and write this code in terms of Buffer rather than &[u8]?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... Not important for this PR, but it seems unfortunate the amount of memory copying we are doing, especially when the major design goal of the IPC spec is to avoid this 😅

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #2437

}

impl CompressionCodec {
#[allow(clippy::ptr_arg)]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding this lint, but I don't think it should be firing for this method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I remove it and run clippy like

cargo clippy -p arrow --features=prettyprint,csv,ipc,test_utils --all-targets -- -D warnings

Clippy tells me:

error: writing `&mut Vec` instead of `&mut [_]` involves a new object where a slice will do
  --> arrow/src/ipc/compression/stub.rs:50:18
   |
50 |         _output: &mut Vec<u8>,
   |                  ^^^^^^^^^^^^ help: change this to: `&mut [u8]`
   |
   = note: `-D clippy::ptr-arg` implied by `-D warnings`
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg

Which while true in this case, is not true for the actual codec implementation, and they both need to have the same signature.

@tustvold tustvold Aug 13, 2022

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very daft lint, &mut Vec<u8> is not the same as &mut [u8], in particular the latter must initialize memory... I can understand &[u8] vs &Vec<u8> but the mutability makes a difference... I somewhat assumed clippy wasn't being silly, guess I was wrong 😅

Comment thread arrow/src/ipc/writer.rs Outdated
Comment thread arrow/src/ipc/writer.rs
);
)?;
}
// pad the tail of body data

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this previously wasn't needed because the buffers were already 8 byte aligned

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably -- I am not sure to be honest. This is code that was in @liukun4515 's original PR but it seemed reasonable to me

Comment thread arrow/src/ipc/writer.rs Outdated
let data_gen = IpcDataGenerator::default();
let mut writer = BufWriter::new(writer);
// write magic to header
let mut header_size: usize = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut header_size: usize = 0;
let header_size: usize = ARROW_MAGIC.len() + 2;

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- I have updated in 2ed7ce3 (and added an assert to verify the currently implicit assumption that ARROW_MAGIC is 6 bytes long.

Comment thread arrow/Cargo.toml Outdated
Comment thread arrow/src/ipc/compression/codec.rs Outdated
Comment thread arrow/src/ipc/compression/codec.rs Outdated
Comment thread arrow/src/ipc/compression/codec.rs Outdated
Comment thread arrow/src/ipc/compression/stub.rs Outdated
// under the License.

//! Stubs that implement the same interface as ipc_compression
//! but always error.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread arrow/src/ipc/reader.rs Outdated
Comment thread arrow/src/ipc/writer.rs Outdated
@tustvold tustvold merged commit 5e27d93 into apache:master Aug 14, 2022
@ursabot

ursabot commented Aug 14, 2022

Copy link
Copy Markdown

Benchmark runs are scheduled for baseline = 1c879ae and contender = 5e27d93. 5e27d93 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

5 participants