Skip to content

Core, Spark: Fix stale manifest_length in rewrite_table_path manifest lists#16910

Open
wombatu-kun wants to merge 4 commits into
apache:mainfrom
wombatu-kun:issue/16905-manifest-length-rewrite-table-path
Open

Core, Spark: Fix stale manifest_length in rewrite_table_path manifest lists#16910
wombatu-kun wants to merge 4 commits into
apache:mainfrom
wombatu-kun:issue/16905-manifest-length-rewrite-table-path

Conversation

@wombatu-kun

@wombatu-kun wombatu-kun commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Closes #16905

Problem

RewriteTablePathUtil.rewriteManifestList rewrote each manifest-list entry updating only the manifest path and left manifest_length (spec field 501) at the source value. When the target prefix differs in length from the source, the rewritten manifest embeds different-length data-file paths and therefore differs in byte size, so manifest_length no longer matches the rewritten manifest on disk. Readers that validate the field (Trino, Impala, iceberg-rust) fail with Incorrect file size ... (end of stream not reached); Spark ignores the field, which is why the bug only surfaces on some engines.

Fix

The rewritten manifest's true length is only known after the manifests are rewritten, but the Spark action wrote the manifest list first. rebuildMetadata now discovers the manifests to rewrite (read-only), rewrites them while capturing each one's byte length, and writes the manifest lists last, stamping manifest_length from the captured lengths. The length is taken from the manifest writer (ManifestWriter.length()) rather than a getLength()/HEAD call, so it stays accurate even where the length of an in-progress write underreports. Manifests not rewritten in an incremental run keep their original length. Because the staged metadata is copied to the target byte-for-byte (Distcp), the recorded length equals the target file's actual size.

Tests

  • TestRewriteTablePathsAction#testManifestLengthAfterRewrite: rewrites a multi-snapshot table to a longer target prefix and asserts that manifest_length matches the on-disk manifest size for every manifest-list entry across all snapshots, current and historical (format versions 2-4, covering both Avro and Parquet manifests). Fails before this change, passes after.
  • TestRewriteTablePathUtil: unit coverage for stamping manifest_length from the length map and for falling back to the source length when a manifest is absent from the map.

Notes

Same class of bug as #15470 (delete-file file_size_in_bytes inside manifests), but for manifest_length in the manifest list; the two fixes are independent. They touch overlapping code in the rewrite path, so whichever lands second will need a small rebase.


// rebuild manifest-list files last, stamping manifest_length with the rewritten manifest sizes
Set<RewriteResult<ManifestFile>> manifestListResults = Sets.newConcurrentHashSet();
Tasks.foreach(validSnapshots)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on performance: this loop currently defaults to single-threaded execution because PySpark/SQL cannot configure an Executor Service, directly causing slower runtimes for rewrite_table_path. I've raised a feature request to address this, and I believe it's an important bottleneck we should advocate to fix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threading here is unchanged by this PR - both the new discovery loop and the manifest-list loop reuse the action's existing executorService, the same as the original single loop did. Making that executor parallel by default is orthogonal to the manifest_length fix, so #16752 is the right place to address it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course.
I assumed it was Out-Of-Scope, just wanted to point out a potential bottleneck.

@wombatu-kun wombatu-kun requested a review from leeyam24 June 22, 2026 13:54

@leeyam24 leeyam24 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm done reviewing, looks pretty good to me.

@kevinjqliu

Copy link
Copy Markdown
Contributor

Thanks for the PR! I'll take a look. I reference this in #15470 (comment)

I think it would be good to include this in the upcoming patch release.

Could you rebase to resolve conflict? Might also be good to follow the pattern from #15470

@kevinjqliu kevinjqliu self-requested a review June 27, 2026 20:58
Vova Kolmakov and others added 3 commits June 29, 2026 08:22
… lists

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wombatu-kun wombatu-kun force-pushed the issue/16905-manifest-length-rewrite-table-path branch from a81e706 to babe6a8 Compare June 29, 2026 01:31
…se line

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wombatu-kun

Copy link
Copy Markdown
Contributor Author

Rebased onto main; the conflict with #15470 is resolved. The manifest_length fix follows the same pattern as #15470 - new overloads that record the measured length via writer.length(), threaded through a Map<String, Long> with a getOrDefault fallback, and @Deprecated on the prior signatures. Ready for another look.

@danielcweeks danielcweeks self-requested a review June 29, 2026 18:28
ManifestFile newFile = file.copy();
((StructLike) newFile).set(0, newPath(newFile.path(), sourcePrefix, targetPrefix));
((StructLike) newFile)
.set(1, rewrittenManifestLengths.getOrDefault(file.path(), file.length()));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an intentional decision, based on

Manifests not rewritten in an incremental run keep their original length.

but it does still mean that this procedure will silently produce incorrect manifest lists in incremental mode, if rewritten manifest lists referenced manifests that were already rewritten in a previous run.

If the rewritten manifest size changed after the prefix rewrite, which it typically does, the new manifest list is still inconsistent with the file it references.

This is a known incorrect behaviour described in #13720 (comment) and acknowledged there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - this is the same bug as your #13719 (#16905 is a later duplicate), and your #13720 went further than this PR: by rewriting every referenced manifest and requiring all lengths to be known, it kept the incremental case correct too. This PR keeps the incremental optimization (only new manifests are rewritten), so carried-over manifests fall back to file.length() and their manifest_length stays stale - the exact A/B/C case you raised with @dramaticlly. So the non-incremental path is fixed here but the incremental path is not, and closing it means re-rewriting all referenced manifests, the efficiency tradeoff that stalled #13720 at the community sync. I'd rather settle that completeness-vs-efficiency question on #13719 / dev@ than decide it unilaterally here, and I'm glad to help revive your #13720. Even as-is this PR is a strict improvement over main, where every entry kept the source length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rewrite_table_path does not update manifest_length in rewritten manifest lists, causing Trino to fail with "Incorrect file size"

5 participants