Skip to content

chore: Move some utility methods to submodules of scalar_funcs#590

Merged
viirya merged 2 commits into
apache:mainfrom
advancedxy:refine_scalar_funcs
Jun 25, 2024
Merged

chore: Move some utility methods to submodules of scalar_funcs#590
viirya merged 2 commits into
apache:mainfrom
advancedxy:refine_scalar_funcs

Conversation

@advancedxy

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

This is a follow-up as discussed in #449 (comment)

What changes are included in this PR?

  1. hex_encode and wrap_digest_result_as_hex_string goes to hex submodule
  2. spark_murmur3_hash and spark_xxhash64 goes to hash_expressions submodule
  3. update benchmark and modify test code

How are these changes tested?

Existing tests with one slightly modification.

@advancedxy

Copy link
Copy Markdown
Contributor Author

@andygrove, @tshauck and @comphead would you mind to take you at this once CI passes?

Comment thread core/src/execution/datafusion/expressions/scalar_funcs/hex.rs
}
}

pub fn spark_xxhash64(args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not this PR problem but we need a description to pub methods

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, might consider limiting the scope. This function probably isn't needed outside of scalar_funcs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we need a description to pub methods

I can add a description here.

Additionally, might consider limiting the scope. This function probably isn't needed outside of scalar_funcs.

Yeah, I originally limited it to pub(super). However we are accessing it in the benchmark module which needs public interface. I think we can leave it as it is and address access scope later by rewriting the benchmark code.

match seed {
ColumnarValue::Scalar(ScalarValue::Int32(Some(seed))) => {
// iterate over the arguments to find out the length of the array
let num_rows = args[0..args.len() - 1]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance here to be an index out of bounds?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The seed is always provided in the Spark/JVM side.

@tshauck tshauck left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others probably have more substantive feedback, but in terms of a good step for better organization, this LGTM

}
}

pub fn spark_xxhash64(args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, might consider limiting the scope. This function probably isn't needed outside of scalar_funcs.

Comment thread core/src/execution/datafusion/expressions/scalar_funcs/hex.rs Outdated

@kazuyukitanimura kazuyukitanimura left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One minor question

Comment thread core/benches/hash.rs
@advancedxy

Copy link
Copy Markdown
Contributor Author

Gently ping @comphead @andygrove and @kazuyukitanimura

@kazuyukitanimura kazuyukitanimura left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @advancedxy

@viirya viirya merged commit 2992a5e into apache:main Jun 25, 2024
@viirya

viirya commented Jun 25, 2024

Copy link
Copy Markdown
Member

Merged. Thanks @advancedxy @kazuyukitanimura @tshauck @comphead

@advancedxy

Copy link
Copy Markdown
Contributor Author

Thanks everyone for reviewing.

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
…e#590)

* chore: Move some utility methods to submodules of scalar_funcs

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants