Skip to content

[SPARK-21603][SQL]The wholestage codegen will be much slower then that is closed when the function is too long#18810

Closed
eatoncys wants to merge 20 commits into
apache:masterfrom
eatoncys:codegen
Closed

[SPARK-21603][SQL]The wholestage codegen will be much slower then that is closed when the function is too long#18810
eatoncys wants to merge 20 commits into
apache:masterfrom
eatoncys:codegen

Conversation

@eatoncys

@eatoncys eatoncys commented Aug 2, 2017

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Close the whole stage codegen when the function lines is longer than the maxlines which will be setted by
spark.sql.codegen.MaxFunctionLength parameter, because when the function is too long , it will not get the JIT optimizing.
A benchmark test result is 10x slower when the generated function is too long :

ignore("max function length of wholestagecodegen") {
val N = 20 << 15

val benchmark = new Benchmark("max function length of wholestagecodegen", N)
def f(): Unit = sparkSession.range(N)
  .selectExpr(
    "id",
    "(id & 1023) as k1",
    "cast(id & 1023 as double) as k2",
    "cast(id & 1023 as int) as k3",
    "case when id > 100 and id <= 200 then 1 else 0 end as v1",
    "case when id > 200 and id <= 300 then 1 else 0 end as v2",
    "case when id > 300 and id <= 400 then 1 else 0 end as v3",
    "case when id > 400 and id <= 500 then 1 else 0 end as v4",
    "case when id > 500 and id <= 600 then 1 else 0 end as v5",
    "case when id > 600 and id <= 700 then 1 else 0 end as v6",
    "case when id > 700 and id <= 800 then 1 else 0 end as v7",
    "case when id > 800 and id <= 900 then 1 else 0 end as v8",
    "case when id > 900 and id <= 1000 then 1 else 0 end as v9",
    "case when id > 1000 and id <= 1100 then 1 else 0 end as v10",
    "case when id > 1100 and id <= 1200 then 1 else 0 end as v11",
    "case when id > 1200 and id <= 1300 then 1 else 0 end as v12",
    "case when id > 1300 and id <= 1400 then 1 else 0 end as v13",
    "case when id > 1400 and id <= 1500 then 1 else 0 end as v14",
    "case when id > 1500 and id <= 1600 then 1 else 0 end as v15",
    "case when id > 1600 and id <= 1700 then 1 else 0 end as v16",
    "case when id > 1700 and id <= 1800 then 1 else 0 end as v17",
    "case when id > 1800 and id <= 1900 then 1 else 0 end as v18")
  .groupBy("k1", "k2", "k3")
  .sum()
  .collect()

benchmark.addCase(s"codegen = F") { iter =>
  sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
  f()
}

benchmark.addCase(s"codegen = T") { iter =>
  sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
  sparkSession.conf.set("spark.sql.codegen.MaxFunctionLength", "10000")
  f()
}

benchmark.run()

/*
Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14 on Windows 7 6.1
Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
max function length of wholestagecodegen: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                    443 /  507          1.5         676.0       1.0X
codegen = T                                   3279 / 3283          0.2        5002.6       0.1X
 */

}

How was this patch tested?

Run the unit test

@eatoncys eatoncys changed the title [SPARK-21603][sql]The wholestage codegen will be much slower then wholestage codegen is closed when the function is too long [SPARK-21603][sql]The wholestage codegen will be much slower then that is closed when the function is too long Aug 2, 2017
*/
def existTooLongFunction(): Boolean = {
classFunctions.exists { case (className, functions) =>
functions.exists{ case (name, code) =>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why only the number of characters for a function is a decision factor to enable or disable the whole-stage codegen?

@eatoncys eatoncys Aug 4, 2017

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiszk Because when the JVM parameter -XX:+DontCompileHugeMethods is true, it can not get the JIT optimization when the byte code of a function is longer than 8000, I just estimate a function lines by 8000 byte code, maybe there are some other good ways.

@kiszk kiszk Aug 4, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you for your explanation. It would be good to add comment for this reasoning.

We have seen the similar control at here. Can we unify these control mechanisms into one?

@viirya viirya Aug 7, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this line counts include code comments in the functions? Do we need to strip comments?

.doc("The maximum number of function length that will be supported before" +
" deactivating whole-stage codegen.")
.intConf
.createWithDefault(1500)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to explain why 1500 is the good value as default?


benchmark.addCase(s"codegen = T") { iter =>
sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
sparkSession.conf.set("spark.sql.codegen.MaxFunctionLength", "10000")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this benchmark use the default number 1500?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same q

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have added a test use the default number 1500, thanks.

@gatorsmile

Copy link
Copy Markdown
Member

ok to test

@SparkQA

SparkQA commented Aug 7, 2017

Copy link
Copy Markdown

Test build #80318 has finished for PR 18810 at commit 1b0ac5e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val existLongFunction = ctx.existTooLongFunction
if (existLongFunction) {
logWarning(s"Function is too long, Whole-stage codegen disabled for this plan:\n "
+ s"$treeString")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be very big. Please follow what did in #18658

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile , thank you for review, the treeString not contains the code, it only contains the tree string of the Physical plan like below:
*HashAggregate(keys=[k1#2395L, k2#2396, k3#2397], functions=[partial_sum(id#2392L)...
+- *Project [id#2392L, (id#2392L & 1023) AS k1#2395L, cast((id#2392L & 1023) as double) AS k2#2396...
+- *Range (0, 655360, step=1, splits=1)
So, I think it will not be very big.


/**
* Returns the length of codegen function is too long or not
*/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a checking logics here, instead of returning Boolean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have modified it, thanks.

@SparkQA

SparkQA commented Aug 7, 2017

Copy link
Copy Markdown

Test build #80325 has finished for PR 18810 at commit 7e84753.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
private val placeHolderToComments = new mutable.HashMap[String, String]

/**

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Returns if the length ....

nit: Please remove extra space before is too long.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have modified it, thanks.

val WHOLESTAGE_MAX_FUNCTION_LEN = buildConf("spark.sql.codegen.MaxFunctionLength")
.internal()
.doc("The maximum number of function length that will be supported before" +
" deactivating whole-stage codegen.")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add indent here.

val (ctx, cleanedSource) = doCodeGen()
val existLongFunction = ctx.existTooLongFunction
if (existLongFunction) {
logWarning(s"Function is too long, Whole-stage codegen disabled for this plan:\n "

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply explain why Whole-stage codegen is disabled. Found too long generated codes and JIT optimization might not work. Whole-stage codegen disabled for this plan...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we can also add log message like You can change the config spark.sql.codegen.MaxFunctionLength to adjust the function length limit.

In case users wants to run whole-stage codegen intentionally.

@SparkQA

SparkQA commented Aug 7, 2017

Copy link
Copy Markdown

Test build #80326 has finished for PR 18810 at commit c4235dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 7, 2017

Copy link
Copy Markdown

Test build #80329 has finished for PR 18810 at commit 52da6b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@eatoncys

eatoncys commented Aug 8, 2017

Copy link
Copy Markdown
Contributor Author

cc @gatorsmile

"disable logging or -1 to apply no limit.")
.createWithDefault(1000)

val WHOLESTAGE_MAX_FUNCTION_LEN = buildConf("spark.sql.codegen.MaxFunctionLength")

@gatorsmile gatorsmile Aug 9, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxFunctionLength -> maxLinesPerFunction

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have modified it, thanks

*/
private val placeHolderToComments = new mutable.HashMap[String, String]

/**

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length is misleading. Here, it is just number of lines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have modified it, thanks

*/
def existTooLongFunction(): Boolean = {
classFunctions.exists { case (className, functions) =>
functions.exists{ case (name, code) =>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like what @viirya said, could you need to check whether it excludes the comments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have modified it to count lines without comments and extra new lines

@SparkQA

SparkQA commented Aug 10, 2017

Copy link
Copy Markdown

Test build #80470 has finished for PR 18810 at commit d0c753a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

logWarning(s"Found too long generated codes and JIT optimization might not work, " +
s"Whole-stage codegen disabled for this plan, " +
s"You can change the config spark.sql.codegen.MaxFunctionLength " +
s"to adjust the function length limit:\n "

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the useless s

@gatorsmile

Copy link
Copy Markdown
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 1cce1a3 Aug 16, 2017
@maropu

maropu commented Aug 20, 2017

Copy link
Copy Markdown
Member

(copied from jira for just-in-case) Just for your information, I checked the performance changes of TPCDS before/after the pr #18810; the pr affected Q17/Q66 only (that is, they have too long codegen'd functions). The changes are as follows (just run TPCDSQueryBenchmark);
Q17: https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q17.sql
Q66: https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q66.sql

Q17 w/o this pr, 3224.0  --> q17 w/this pr, 2627.0 (perf. improvement)
Q66 w/o this pr, 1712.0 -->  q66 w/this pr, 3032.0 (perf. regression)

It seems their queries have gen'd funcs with 2800~2900 lines, so if we set 2900 at spark.sql.codegen.maxLinesPerFunction, we could keep the previous performance w/o pr18810.

@viirya

viirya commented Aug 21, 2017

Copy link
Copy Markdown
Member

@maropu Interesting. Would you like to benchmark with #18931 too? It is my attempt to solve long code-gen functions without disabling it.

@maropu

maropu commented Aug 21, 2017

Copy link
Copy Markdown
Member

yea, I'll do

@maropu

maropu commented Aug 21, 2017

Copy link
Copy Markdown
Member

Btw, as for merged prs, I'm just monitoring TPCDS perf. in here. Also, I wrote a script before to run TPCDS on pending prs: https://github.com/maropu/spark-tpcds-datagen#helper-scripts-for-benchmarks.

@gatorsmile

gatorsmile commented Aug 21, 2017

Copy link
Copy Markdown
Member

Thank you for tracking it! Could you adjust the conf to a higher number (e.g. 4097) and rerun the perf?

@maropu

maropu commented Aug 22, 2017

Copy link
Copy Markdown
Member

ok, I'll make a pr as follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants