Skip to content

feat: Parquet Modular Encryption with Spark KMS for native readers#2447

Merged
mbutrovich merged 25 commits into
apache:mainfrom
mbutrovich:decryption
Oct 7, 2025
Merged

feat: Parquet Modular Encryption with Spark KMS for native readers#2447
mbutrovich merged 25 commits into
apache:mainfrom
mbutrovich:decryption

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich commented Sep 23, 2025

Which issue does this PR close?

Closes #.

Rationale for this change

We want to add Parquet Module Encryption support for the native readers when using a Spark KMS. We use the encryption factory features added in DataFusion 50 to register an encryption factory that uses JNI to get decryption keys from Spark.

What changes are included in this PR?

How are these changes tested?

  • Existing PME tests with new readers added.
  • New tests that exercise PME options like plaintext footer, etc.

@mbutrovich mbutrovich changed the title feat: Parquet Modular Encryption support for native_datafusion and native_iceberg_compat readers feat: Parquet Modular Encryption with Spark KMS for native_datafusion and native_iceberg_compat readers Sep 23, 2025
@mbutrovich mbutrovich changed the title feat: Parquet Modular Encryption with Spark KMS for native_datafusion and native_iceberg_compat readers feat: Parquet Modular Encryption with Spark KMS for native readers Sep 23, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 23, 2025

Codecov Report

❌ Patch coverage is 36.78161% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.92%. Comparing base (f09f8af) to head (257f163).
⚠️ Report is 689 commits behind head on main.

Files with missing lines Patch % Lines
...rg/apache/comet/parquet/CometFileKeyUnwrapper.java 0.00% 18 Missing ⚠️
...a/org/apache/comet/parquet/CometParquetUtils.scala 0.00% 15 Missing ⚠️
...ain/scala/org/apache/comet/CometExecIterator.scala 33.33% 7 Missing and 1 partial ⚠️
...va/org/apache/comet/parquet/NativeBatchReader.java 0.00% 5 Missing ⚠️
...n/scala/org/apache/spark/sql/comet/operators.scala 80.76% 3 Missing and 2 partials ⚠️
...n/scala/org/apache/comet/rules/CometScanRule.scala 42.85% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2447      +/-   ##
============================================
+ Coverage     56.12%   58.92%   +2.79%     
- Complexity      976     1457     +481     
============================================
  Files           119      147      +28     
  Lines         11743    13642    +1899     
  Branches       2251     2369     +118     
============================================
+ Hits           6591     8038    +1447     
- Misses         4012     4381     +369     
- Partials       1140     1223      +83     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java Outdated
Comment thread native/core/src/parquet/parquet_exec.rs Outdated
Comment thread native/core/src/parquet/parquet_exec.rs Outdated
@parthchandra
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich marked this pull request as ready for review September 26, 2025 20:31
# Conflicts:
#	spark/src/main/scala/org/apache/comet/CometExecIterator.scala
Comment thread common/src/main/java/org/apache/comet/parquet/CometFileKeyUnwrapper.java Outdated
Comment thread common/src/main/java/org/apache/comet/parquet/CometFileKeyUnwrapper.java Outdated
Comment thread common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java Outdated
Comment thread common/src/main/scala/org/apache/comet/parquet/CometParquetUtils.scala Outdated
Comment thread spark/src/main/scala/org/apache/spark/sql/comet/operators.scala Outdated
@mbutrovich
Copy link
Copy Markdown
Contributor Author

Results attached from the benchmark I added to CometReadBenchmark, and a small chart with highlights to see what the overhead of encryption is for the various readers.

decryption

benchmark_decryption.txt

Comment thread native/core/src/parquet/encryption_support.rs
Comment thread common/src/main/java/org/apache/comet/parquet/CometFileKeyUnwrapper.java Outdated
Comment thread common/src/main/java/org/apache/comet/parquet/CometFileKeyUnwrapper.java Outdated
Comment thread native/core/src/parquet/encryption_support.rs
Comment thread native/core/src/parquet/encryption_support.rs Outdated
Copy link
Copy Markdown
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


// spotless:off
/*
* Architecture Overview:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diagram is super helpful, thanks a lot.

Comment thread spark/src/test/scala/org/apache/spark/sql/comet/ParquetEncryptionITCase.scala Outdated
# Conflicts:
#	native/core/src/execution/jni_api.rs
#	spark/src/main/scala/org/apache/comet/CometExecIterator.scala
#	spark/src/main/scala/org/apache/comet/Native.scala
Copy link
Copy Markdown
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mbutrovich mbutrovich merged commit c23dc25 into apache:main Oct 7, 2025
102 checks passed
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
@mbutrovich mbutrovich deleted the decryption branch March 13, 2026 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants