[core] support decouple the delta files lifecycle#3178
Conversation
|
Perhaps it's best for us to have an abstract mechanism to ensure that we don't keep both the changelog and delta files at the same time, so that we can better understand this set of things. |
078b05a to
fa73cb7
Compare
I extract this logic to In this PR, I also refactor the |
9c06546 to
2ae1028
Compare
| // expire | ||
| checkAnswer( | ||
| spark.sql("CALL paimon.sys.expire_snapshots(table => 'test.T', retain_max => 2)"), | ||
| spark.sql("CALL paimon.sys.expire_snapshots(table => 'test.T', retain_max => 2, retain_min => 1)"), |
There was a problem hiding this comment.
Before, if user not specify the retain_min, the default value is 1 in ExpireSnapshotsImpl, now the default value is fallback to the CoreOptions.SNAPSHOT_RETAIN_MIN = 10, so if max is 2, we should manually specify the retain_min => 1. I think the current behavior is more consistent, I'm not sure whether this will break the compatibility. Please also help check this cc @JingsongLi
|
Already merged to three PRs. |
Purpose
This PR is meant to support decouple the delta files lifecycle #2899
The basic idea behind this is that:
DatafileMetato indicate whether this file is generated as anAPPENDorCOMPACTfileAPPENDfiles in data filebaseanddeltamanifest file for thenoneproducer are also postpone to deleteAbout why we need
FileSourcein DataFileMetaFor
nonechangelog producer, onlyAPPENDcommits are required for stream read. In aCOMPACTcommit, some files from the compact or append could be marked as delete. We should delete the files from the compact commit and keep the files from the append commit for further stream read. So we need a flag to distinguish the file source (compact or append).Linked issue: close #xxx
Tests
API and Format
Introduce
FileSourcein DataFileMetaDocumentation