[SPARK-36498][SQL] Reorder inner fields of the input query in byName V2 write#33728
[SPARK-36498][SQL] Reorder inner fields of the input query in byName V2 write#33728cloud-fan wants to merge 4 commits into
Conversation
| } | ||
|
|
||
| abstract class DataSourceV2AnalysisBaseSuite extends AnalysisTest { | ||
| abstract class V2WriteAnalysisSuiteBase extends AnalysisTest { |
There was a problem hiding this comment.
a small update to make the test suite names consistent: V2XXXSuite
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
sunchao
left a comment
There was a problem hiding this comment.
Thanks @cloud-fan for pinging. LGTM (non-binding).
|
Test build #142387 has finished for PR 33728 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Test build #142405 has finished for PR 33728 at commit
|
|
retest this please |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142411 has finished for PR 33728 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #142423 has finished for PR 33728 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #142434 has finished for PR 33728 at commit
|
| val newKeys = ArrayTransform(MapKeys(input), keyFunc) | ||
| val newValues = ArrayTransform(MapValues(input), valueFunc) | ||
| Some(Alias(MapFromArrays(newKeys, newValues), expectedName)()) |
There was a problem hiding this comment.
TransformValues(TransformKeys(input, keyFunc), valueFunc)?
There was a problem hiding this comment.
This creates map twice, can be slower.
|
jenkins passes and GA failure is unrelated. I'm merging this to master, thanks for the review! |
What changes were proposed in this pull request?
Today, when we write data to a v2 table with byName mode, we only reorder the top-level columns, not inner struct fields. This doesn't make sense as Spark should treat inner struct fields as the first-class citizen (e.g. nested column pruning, filter pushdown with nested columns).
This PR improves
TableOutputResolverto reorder inner fields as well.Why are the changes needed?
better user-experience
Does this PR introduce any user-facing change?
yes, more queries are allowed to write to v2 tables.
How was this patch tested?
new test