Skip to content

Spark: Add separate data and metadata file lists to RewriteTablePath#15382

Closed
mxm wants to merge 5 commits into
apache:mainfrom
mxm:rewrite-table-path-file-lists
Closed

Spark: Add separate data and metadata file lists to RewriteTablePath#15382
mxm wants to merge 5 commits into
apache:mainfrom
mxm:rewrite-table-path-file-lists

Conversation

@mxm

@mxm mxm commented Feb 20, 2026

Copy link
Copy Markdown
Contributor

Downstream tools often need different copy strategies for data vs metadata files (e.g. different location, additional checks on data vs metadata files). This splits the file list into separate data and metadata lists while keeping the combined list for backward compatibility.

Comment thread api/src/main/java/org/apache/iceberg/actions/RewriteTablePath.java
@mxm

mxm commented Feb 20, 2026

Copy link
Copy Markdown
Contributor Author

Comment thread core/src/main/java/org/apache/iceberg/actions/BaseRewriteTablePath.java Outdated
Comment on lines +334 to +335
String fileListLocation = saveFileList(copyPlan, RESULT_LOCATION);
String dataFileListLocation = saveFileList(dataCopyPlan, DATA_FILE_LIST_DIR);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we afraid of the cost of duplicated file list write?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We potentially write a lot of files (metadata, manifest lists, manifests), so writing two instead of one txt file looks ok to me.

Comment on lines -510 to +532
@Override
public RewriteContentFileResult append(RewriteResult<ContentFile<?>> r1) {
this.copyPlan().addAll(r1.copyPlan());
this.toRewrite().addAll(r1.toRewrite());

public RewriteContentFileResult append(RewriteContentFileResult other) {
this.copyPlan().addAll(other.copyPlan());
this.toRewrite().addAll(other.toRewrite());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual return type from where this is being called it RewriteContentFileResult, so I decided to narrow the type.

RewriteResult relies on generics which are subject to type erasure, see

public RewriteResult<T> append(RewriteResult<T> r1) {
It could be worthwhile to get rid of the types, since the different types aren't treated any different.

@mxm mxm force-pushed the rewrite-table-path-file-lists branch from 6f88d67 to 9577ecd Compare March 12, 2026 11:44
@mxm

mxm commented Mar 12, 2026

Copy link
Copy Markdown
Contributor Author

Rebased to resolve conflicts with #15381.

@mxm mxm force-pushed the rewrite-table-path-file-lists branch from 9577ecd to 356bebf Compare March 12, 2026 14:08
@github-actions

Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Apr 12, 2026
@mxm

mxm commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

I'm keeping this open for now.

@github-actions github-actions Bot removed the stale label Apr 15, 2026
@github-actions

Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label May 15, 2026
@github-actions

Copy link
Copy Markdown

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants