Skip to content

test: add unhex dictionary coverage#4222

Merged
andygrove merged 1 commit into
apache:mainfrom
yuboxx:fix-unhex-dictionary-477
May 6, 2026
Merged

test: add unhex dictionary coverage#4222
andygrove merged 1 commit into
apache:mainfrom
yuboxx:fix-unhex-dictionary-477

Conversation

@yuboxx

@yuboxx yuboxx commented May 5, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR adds coverage for the native Spark-compatible unhex expression with Parquet dictionary encoding enabled and disabled.

The original issue suggested that dictionary-backed inputs might need expression-level support. The added tests confirm that Parquet dictionary string values are already unpacked before expression evaluation, so this PR does not change the native unhex implementation.

The new coverage includes:

  • Spark expression coverage with parquet.enable.dictionary=false,true
  • Generated primitive Parquet coverage via makeParquetFileAllPrimitiveTypes, using the _8 UTF8 string column

Closes #477.

How was this patch tested?

  • cargo test -p datafusion-comet-spark-expr math_funcs::unhex::test
  • ./mvnw -pl spark -Dsuites=org.apache.comet.CometExpressionSuite -Dtest=none -Pspark-4.0 -Pscala-2.13 test

@andygrove

Copy link
Copy Markdown
Member

Thanks for the contribution @yuboxx. Could you tell me more about the motivation for this PR? The changes looks reasonable, but do not address any bugs as far as I can tell. Dictionary-encoded arrays are unpacked in parquet_convert_array before any expressions can be evaluated.

@yuboxx

yuboxx commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

Could you tell me more about the motivation for this PR?

Thanks for the pointer! I was looking for some issues to work with as I'm onboarding to the codebase and stumbled on issue #477. It raised a point about dictionary type support in unhex, but apparently dictionary array is already being unpacked in advance. I'm going to remove my change in unhex.rs and only keep the unit test change to show dictionary is supported. Let me know if this makes sense.

@yuboxx yuboxx changed the title fix: support unhex on dictionary strings test: add unhex dictionary coverage May 6, 2026
@yuboxx yuboxx force-pushed the fix-unhex-dictionary-477 branch from cf6552a to f474012 Compare May 6, 2026 03:56
@andygrove

Copy link
Copy Markdown
Member

Could you tell me more about the motivation for this PR?

Thanks for the pointer! I was looking for some issues to work with as I'm onboarding to the codebase and stumbled on issue #477. It raised a point about dictionary type support in unhex, but apparently dictionary array is already being unpacked in advance. I'm going to remove my change in unhex.rs and only keep the unit test change to show dictionary is supported. Let me know if this makes sense.

Sounds good! Thanks

@andygrove andygrove left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

@andygrove andygrove merged commit 5910560 into apache:main May 6, 2026
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dictionary support to unhex and test dictionary and scalar cases

2 participants