[SPARK-36498][SQL] Reorder inner fields of the input query in byName V2 write by cloud-fan · Pull Request #33728 · apache/spark

cloud-fan · 2021-08-12T15:16:32Z

What changes were proposed in this pull request?

Today, when we write data to a v2 table with byName mode, we only reorder the top-level columns, not inner struct fields. This doesn't make sense as Spark should treat inner struct fields as the first-class citizen (e.g. nested column pruning, filter pushdown with nested columns).

This PR improves TableOutputResolver to reorder inner fields as well.

Why are the changes needed?

better user-experience

Does this PR introduce any user-facing change?

yes, more queries are allowed to write to v2 tables.

How was this patch tested?

new test

cloud-fan · 2021-08-12T15:23:20Z

 }

-abstract class DataSourceV2AnalysisBaseSuite extends AnalysisTest {
+abstract class V2WriteAnalysisSuiteBase extends AnalysisTest {


a small update to make the test suite names consistent: V2XXXSuite

cloud-fan · 2021-08-12T15:23:42Z

cc @sunchao @yaooqinn

SparkQA · 2021-08-12T16:08:18Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46892/

SparkQA · 2021-08-12T16:50:56Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46892/

sunchao

Thanks @cloud-fan for pinging. LGTM (non-binding).

SparkQA · 2021-08-12T23:42:05Z

Test build #142387 has finished for PR 33728 at commit 81101d0.

This patch fails from timeout after a configured wait of 500m.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2021-08-13T01:23:31Z

retest this please

SparkQA · 2021-08-13T02:12:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46911/

SparkQA · 2021-08-13T02:44:08Z

Test build #142405 has finished for PR 33728 at commit 81101d0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2021-08-13T02:45:38Z

retest this please

SparkQA · 2021-08-13T03:06:34Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46911/

SparkQA · 2021-08-13T04:04:09Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46917/

SparkQA · 2021-08-13T05:00:56Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46917/

SparkQA · 2021-08-13T08:14:15Z

Test build #142411 has finished for PR 33728 at commit 81101d0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-08-13T09:44:00Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46929/

SparkQA · 2021-08-13T10:23:57Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46929/

SparkQA · 2021-08-13T12:59:50Z

Test build #142423 has finished for PR 33728 at commit 3693ce6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-08-13T17:05:34Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46941/

SparkQA · 2021-08-13T17:43:28Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46941/

SparkQA · 2021-08-13T21:03:42Z

Test build #142434 has finished for PR 33728 at commit 452b535.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2021-08-15T02:09:49Z

+        val newKeys = ArrayTransform(MapKeys(input), keyFunc)
+        val newValues = ArrayTransform(MapValues(input), valueFunc)
+        Some(Alias(MapFromArrays(newKeys, newValues), expectedName)())


TransformValues(TransformKeys(input, keyFunc), valueFunc)?

This creates map twice, can be slower.

cloud-fan · 2021-08-16T07:07:26Z

jenkins passes and GA failure is unrelated. I'm merging this to master, thanks for the review!

cloud-fan added 2 commits August 12, 2021 21:35

tmp

c6ecbfb

reorder inner fields in byName V2 write

81101d0

github-actions Bot added the SQL label Aug 12, 2021

cloud-fan commented Aug 12, 2021

View reviewed changes

dilipbiswal reviewed Aug 12, 2021

View reviewed changes

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala

sunchao approved these changes Aug 12, 2021

View reviewed changes

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala

yaooqinn approved these changes Aug 13, 2021

View reviewed changes

support array/map

3693ce6

fix test

452b535

viirya reviewed Aug 15, 2021

View reviewed changes

viirya approved these changes Aug 15, 2021

View reviewed changes

cloud-fan closed this in f4b31c6 Aug 16, 2021

wForget mentioned this pull request Jul 18, 2024

[SPARK-48922][SQL] Optimize nested data type insertion performance #47381

Closed

Uh oh!

Conversation

cloud-fan commented Aug 12, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan Aug 12, 2021

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Aug 12, 2021

Uh oh!

Uh oh!

SparkQA commented Aug 12, 2021

Uh oh!

SparkQA commented Aug 12, 2021

Uh oh!

sunchao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Aug 12, 2021

Uh oh!

HyukjinKwon commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

yaooqinn commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

SparkQA commented Aug 13, 2021

Uh oh!

viirya Aug 15, 2021

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 16, 2021

Choose a reason for hiding this comment

Uh oh!

viirya Aug 16, 2021

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Aug 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants