Skip to content

Core: Only add required encrypted keys to metadata#16837

Closed
Hugo-WB wants to merge 1 commit into
apache:mainfrom
Hugo-WB:only-add-required-encryption-keys-to-metadata
Closed

Core: Only add required encrypted keys to metadata#16837
Hugo-WB wants to merge 1 commit into
apache:mainfrom
Hugo-WB:only-add-required-encryption-keys-to-metadata

Conversation

@Hugo-WB

@Hugo-WB Hugo-WB commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Addresses: #16352

After putting up: #16353 which cleaned encryption keys at expire_snapshot time by removing the encryption key associated with the removed snapshot. (In TableMetadata::RemoveSnapshot also RemoveEncryptionKey for the keyId of the removed snapshot).

When writing E2E tests for above, it wasn't working due to HiveTableOperations::doCommit always adding all keys from the EncryptionManager into the TableMetadata.

From my rough understanding, sorry if this is wrong 😅, EncryptionManager is independent of TableMetadata, they "branch off" at metadata load time, and at commit time they are "reconciled".
The current "reconciliation" logic is to add all keys in EncryptionManager to TableMetadata.

This leads to issues when trying to remove keys. Even if the EncryptedKey is removed from the base metadata and the new metadata does not have the key, removal of EncryptionKeys from metadata is never propagated to the EncryptionManager.

This PR proposes to change the "reconciliation" logic to only add keys that are still referenced/needed by the new metadata file.

This effectively does cleanup at runtime.

Ideas for alternative solutions (Open to ideas here, I think I have a rather limited view here of how things are correlated with each-other):

  • Rewrite EncryptionManager to be built on top of TableMetadata. Use TableMetadata as source of truth.
    • EncryptionManager does not store state, delegates to TableMetadata.
    • Alterations to EncryptionManager are immediately reflected within its version of TableMetadata
      • E.g. adding encryption keys or removing encryption keys.
  • Rewrite EncryptionManager to be more tightly coupled with TableMetadata.
    • Such that when RemoveSnapshot -> remove encryption key from EncryptionManager.
    • Keep the "adding of all encryption keys from EncryptionManager" at commit time.

This also related to the REST catalog implementation of encryption: #13225
The above PR has the same issue as the Hive implementation, where it will always add all keys from the EncryptionManager.

@Hugo-WB Hugo-WB force-pushed the only-add-required-encryption-keys-to-metadata branch from a850070 to 84227db Compare June 16, 2026 16:49
@Hugo-WB

Hugo-WB commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Closing in favor of: #16353

I think makes sense to just include it there.

I was also misunderstanding that this would fix the expiration issue on this own, but since we only restricting the keys we are adding, this doesn't solve removing encryption keys on it's own.

@Hugo-WB Hugo-WB closed this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant