Skip to content

[SPARK-28560][SQL][followup] change the local shuffle reader from leaf node to unary node#26250

Closed
JkSelf wants to merge 2 commits into
apache:masterfrom
JkSelf:updateLeafNodeofLocalReaderToUnaryExecNode
Closed

[SPARK-28560][SQL][followup] change the local shuffle reader from leaf node to unary node#26250
JkSelf wants to merge 2 commits into
apache:masterfrom
JkSelf:updateLeafNodeofLocalReaderToUnaryExecNode

Conversation

@JkSelf

@JkSelf JkSelf commented Oct 25, 2019

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Why are the changes needed?

When make the LocalShuffleReaderExec to leaf node, there exists a potential issue: the leaf node will hide the running query stage and make the unfinished query stage as finished query stage when creating its parent query stage.
This PR make the leaf node to unary node.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

@JkSelf

JkSelf commented Oct 25, 2019

Copy link
Copy Markdown
Contributor Author

@cloud-fan @maryannxue Please help me review. Thanks.

@SparkQA

SparkQA commented Oct 25, 2019

Copy link
Copy Markdown

Test build #112645 has finished for PR 26250 at commit 06847fe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class LocalShuffleReaderExec(child: QueryStageExec) extends UnaryExecNode

@JkSelf

JkSelf commented Oct 25, 2019

Copy link
Copy Markdown
Contributor Author

The failed test may be not related. Please help to re-test. Thanks.

@cloud-fan

Copy link
Copy Markdown
Contributor

retest it please

override val id: Int,
override val plan: ShuffleExchangeExec) extends QueryStageExec {
override val plan: ShuffleExchangeExec,
var isLocalShuffle: Boolean = false) extends QueryStageExec {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid using mutable states when not necessary.

}
// ShuffleExchanges introduced by repartition do not support changing the number of partitions.
// We change the number of partitions in the stage only if all the ShuffleExchanges support it.
if (!shuffleStages.forall(_.plan.canChangeNumPartitions)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change the logic of collecting shuffle stages:

def collectShuffleStages(plan: SparkPlan): Seq[ShuffleQueryStageExec] = plan match {
  case _: LocalShuffleReaderExec = Nil
  case stage: ShuffleQueryStageExec => Seq(stage)
  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) => Seq(stage) 
  case _ => plan.children.flatMap(collectShuffleStages)
}

val shuffleStages = collectShuffleStages(plan)

@maryannxue

Copy link
Copy Markdown
Contributor

We should also remove the changes to AdaptiveSparkPlanHelper: https://github.com/apache/spark/pull/25295/files#diff-ece2c98b3d95a3adcb9e08f8b9d45f11R128

@JkSelf

JkSelf commented Oct 28, 2019

Copy link
Copy Markdown
Contributor Author

@cloud-fan @maryannxue update the coments. Please help me review. Thanks.

@SparkQA

SparkQA commented Oct 28, 2019

Copy link
Copy Markdown

Test build #112748 has finished for PR 26250 at commit b5eb977.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan

Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 50cf484 Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants