Skip to content

[SPARK-7970] Skip closure cleaning for SQL operations#9253

Closed
nitin2goyal wants to merge 11 commits into
apache:masterfrom
nitin2goyal:master
Closed

[SPARK-7970] Skip closure cleaning for SQL operations#9253
nitin2goyal wants to merge 11 commits into
apache:masterfrom
nitin2goyal:master

Conversation

@nitin2goyal

Copy link
Copy Markdown

Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.

Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
 Aggregate [a#0], [a#0,count(b#1) AS c#138L]
  Project [a#0,b#1]
   LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
 Aggregate [a#0], [a#0,count(b#1) AS c#138L]
  Filter (a#0 = 3)
   Project [a#0,b#1]
    LocalRelation [a#0,b#1,c#2]
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.
@nitin2goyal

Copy link
Copy Markdown
Author

cc @andrewor14

@andrewor14

Copy link
Copy Markdown
Contributor

ok to test

@SparkQA

SparkQA commented Oct 23, 2015

Copy link
Copy Markdown

Test build #44238 has finished for PR 9253 at commit ca487cb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 24, 2015

Copy link
Copy Markdown

Test build #44284 has finished for PR 9253 at commit 6a9f738.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nitin2goyal

Copy link
Copy Markdown
Author

@andrewor14 - not sure which test has failed. can we retest this please?

@andrewor14

Copy link
Copy Markdown
Contributor

retest this please

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use @param here

@andrewor14

Copy link
Copy Markdown
Contributor

Looks great! I look forward to getting this merged. Once you address the comments I will do so.

@SparkQA

SparkQA commented Oct 29, 2015

Copy link
Copy Markdown

Test build #44592 has finished for PR 9253 at commit 6a9f738.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 29, 2015

Copy link
Copy Markdown

Test build #44609 has finished for PR 9253 at commit 36db8a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nitin2goyal

Copy link
Copy Markdown
Author

Thanks fore reviewing Andrew ( @andrewor14 ). Have addressed your comments. Let me know if it looks good.

@andrewor14

Copy link
Copy Markdown
Contributor

@nitin2goyal Sorry for the delay. This LGTM. I will merge it once you rebase to master again.

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/Aggregate.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala
@SparkQA

SparkQA commented Nov 13, 2015

Copy link
Copy Markdown

Test build #45870 has finished for PR 9253 at commit aa4a7ce.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public class JavaGradientBoostingClassificationExample\n * public class JavaGradientBoostingRegressionExample\n * public class JavaRandomForestClassificationExample\n * public class JavaRandomForestRegressionExample\n

@andrewor14

Copy link
Copy Markdown
Contributor

retest this please

@SparkQA

SparkQA commented Nov 14, 2015

Copy link
Copy Markdown

Test build #45895 has finished for PR 9253 at commit aa4a7ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in c939c70 Nov 14, 2015
asfgit pushed a commit that referenced this pull request Nov 14, 2015
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.

Author: nitin goyal <nitin.goyal@guavus.com>
Author: nitin.goyal <nitin.goyal@guavus.com>

Closes #9253 from nitin2goyal/master.

(cherry picked from commit c939c70)
Signed-off-by: Andrew Or <andrew@databricks.com>
@tedyu

tedyu commented Nov 15, 2015

Copy link
Copy Markdown
Contributor

Should mapPartitions() be replaced with mapPartitionsInternal() in the following classes ?

    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
    val rootType = schemaData.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala
    json.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala
    rows.mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONRelation.scala
        .mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala
      .mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala
      child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
    data.mapPartitions { iterator =>
    data.mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala
    streamedPlan.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
    streamedPlan.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala
    val matchesOrStreamedRowsWithNulls = streamed.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoin.scala
    streamed.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/LeftSemiJoinBNL.scala
    rdd.mapPartitions { iter =>
    inputRDD.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/python.scala
    child.execute().mapPartitions { iter =>
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/rowFormatConverters.scala
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala
    child.execute().mapPartitions { stream =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala

If so, allow me to open a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants