Skip to content

Spark 3.3: Remove unnecessary metadata columns reading when merge using Iceberg table#7985

Closed
ConeyLiu wants to merge 1 commit into
apache:masterfrom
ConeyLiu:remove-unnecessay-metadata-read
Closed

Spark 3.3: Remove unnecessary metadata columns reading when merge using Iceberg table#7985
ConeyLiu wants to merge 1 commit into
apache:masterfrom
ConeyLiu:remove-unnecessay-metadata-read

Conversation

@ConeyLiu

@ConeyLiu ConeyLiu commented Jul 4, 2023

Copy link
Copy Markdown
Contributor

There are many unnecessary metadata columns reading when merging using the Iceberg table. The problem should be caused by Spark 3.3 AddMetadataColumns rule which has been fixed in Spark 3.4.
In this PR, we add a rule to remove the unnecessary metadata column reading to fix the problem in Spark 3.3.

Before this PR:
image

After this PR:
image

@github-actions github-actions Bot added the spark label Jul 4, 2023
@ConeyLiu

ConeyLiu commented Jul 4, 2023

Copy link
Copy Markdown
Contributor Author

Hi @rdblue @szehon-ho @aokolnychyi @RussellSpitzer @Fokko, could you please help to review this when you are free? Thanks a lot.

@RussellSpitzer

Copy link
Copy Markdown
Member

@huaxingao I believe you did the Spark fix for this?

@huaxingao

Copy link
Copy Markdown
Contributor

I think the problem has already been fixed in Spark 3.3 by this PR

@ConeyLiu

ConeyLiu commented Jul 4, 2023

Copy link
Copy Markdown
Contributor Author

Thanks @RussellSpitzer @huaxingao, I see, that's because the Spark 3.3.3 is not released.

@ConeyLiu

ConeyLiu commented Jul 5, 2023

Copy link
Copy Markdown
Contributor Author

Close this since Spark 3.3 has been fixed.

@ConeyLiu ConeyLiu closed this Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants