[SPARK-7970] Skip closure cleaning for SQL operations#9253
[SPARK-7970] Skip closure cleaning for SQL operations#9253nitin2goyal wants to merge 11 commits into
Conversation
…'group by' attribute set
…'group by' attribute set
…'group by' attribute set
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.
Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions. Query plan before optimisation :- Filter ((c#138L = 2) && (a#0 = 3)) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Project [a#0,b#1] LocalRelation [a#0,b#1,c#2] Query plan after optimisation :- Filter (c#138L = 2) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Filter (a#0 = 3) Project [a#0,b#1] LocalRelation [a#0,b#1,c#2]
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.
|
cc @andrewor14 |
|
ok to test |
|
Test build #44238 has finished for PR 9253 at commit
|
|
Test build #44284 has finished for PR 9253 at commit
|
|
@andrewor14 - not sure which test has failed. can we retest this please? |
|
retest this please |
|
Looks great! I look forward to getting this merged. Once you address the comments I will do so. |
|
Test build #44592 has finished for PR 9253 at commit
|
|
Test build #44609 has finished for PR 9253 at commit
|
|
Thanks fore reviewing Andrew ( @andrewor14 ). Have addressed your comments. Let me know if it looks good. |
|
@nitin2goyal Sorry for the delay. This LGTM. I will merge it once you rebase to master again. |
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Aggregate.scala sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala
|
Test build #45870 has finished for PR 9253 at commit
|
|
retest this please |
|
Test build #45895 has finished for PR 9253 at commit
|
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements. Author: nitin goyal <nitin.goyal@guavus.com> Author: nitin.goyal <nitin.goyal@guavus.com> Closes #9253 from nitin2goyal/master. (cherry picked from commit c939c70) Signed-off-by: Andrew Or <andrew@databricks.com>
|
Should mapPartitions() be replaced with mapPartitionsInternal() in the following classes ? If so, allow me to open a PR |
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.