Skip to content

[SPARK-36498][SQL] Reorder inner fields of the input query in byName V2 write#33728

Closed
cloud-fan wants to merge 4 commits into
apache:masterfrom
cloud-fan:reorder
Closed

[SPARK-36498][SQL] Reorder inner fields of the input query in byName V2 write#33728
cloud-fan wants to merge 4 commits into
apache:masterfrom
cloud-fan:reorder

Conversation

@cloud-fan

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Today, when we write data to a v2 table with byName mode, we only reorder the top-level columns, not inner struct fields. This doesn't make sense as Spark should treat inner struct fields as the first-class citizen (e.g. nested column pruning, filter pushdown with nested columns).

This PR improves TableOutputResolver to reorder inner fields as well.

Why are the changes needed?

better user-experience

Does this PR introduce any user-facing change?

yes, more queries are allowed to write to v2 tables.

How was this patch tested?

new test

@github-actions github-actions Bot added the SQL label Aug 12, 2021
}

abstract class DataSourceV2AnalysisBaseSuite extends AnalysisTest {
abstract class V2WriteAnalysisSuiteBase extends AnalysisTest {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a small update to make the test suite names consistent: V2XXXSuite

@cloud-fan

Copy link
Copy Markdown
Contributor Author

cc @sunchao @yaooqinn

@SparkQA

SparkQA commented Aug 12, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46892/

@SparkQA

SparkQA commented Aug 12, 2021

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46892/

@sunchao sunchao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan for pinging. LGTM (non-binding).

@SparkQA

SparkQA commented Aug 12, 2021

Copy link
Copy Markdown

Test build #142387 has finished for PR 33728 at commit 81101d0.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon

Copy link
Copy Markdown
Member

retest this please

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46911/

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Test build #142405 has finished for PR 33728 at commit 81101d0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn

Copy link
Copy Markdown
Member

retest this please

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46911/

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46917/

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46917/

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Test build #142411 has finished for PR 33728 at commit 81101d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46929/

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46929/

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Test build #142423 has finished for PR 33728 at commit 3693ce6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46941/

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46941/

@SparkQA

SparkQA commented Aug 13, 2021

Copy link
Copy Markdown

Test build #142434 has finished for PR 33728 at commit 452b535.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment on lines +214 to +216
val newKeys = ArrayTransform(MapKeys(input), keyFunc)
val newValues = ArrayTransform(MapValues(input), valueFunc)
Some(Alias(MapFromArrays(newKeys, newValues), expectedName)())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransformValues(TransformKeys(input, keyFunc), valueFunc)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates map twice, can be slower.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cloud-fan

Copy link
Copy Markdown
Contributor Author

jenkins passes and GA failure is unrelated. I'm merging this to master, thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants