Skip to content

[SPARK-39911][SQL][3.3] Optimize global Sort to RepartitionByExpression#37373

Closed
ulysses-you wants to merge 1 commit into
apache:branch-3.3from
ulysses-you:SPARK-39911-3.3
Closed

[SPARK-39911][SQL][3.3] Optimize global Sort to RepartitionByExpression#37373
ulysses-you wants to merge 1 commit into
apache:branch-3.3from
ulysses-you:SPARK-39911-3.3

Conversation

@ulysses-you

@ulysses-you ulysses-you commented Aug 2, 2022

Copy link
Copy Markdown
Contributor

this is for backport #37330 into branch-3.3

What changes were proposed in this pull request?

Optimize Global sort to RepartitionByExpression, for example:

Sort local             Sort local
  Sort global    =>      RepartitionByExpression

Why are the changes needed?

If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort.

Does this PR introduce any user-facing change?

we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment)

How was this patch tested?

add test

Closes #37330 from ulysses-you/optimize-sort.

Authored-by: ulysses-you ulyssesyou18@gmail.com
Signed-off-by: Wenchen Fan wenchen@databricks.com

### What changes were proposed in this pull request?

Optimize Global sort to RepartitionByExpression, for example:
```
Sort local             Sort local
  Sort global    =>      RepartitionByExpression
```

### Why are the changes needed?

If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort.

### Does this PR introduce _any_ user-facing change?

no, only improve performance

### How was this patch tested?

add test

Closes apache#37330 from ulysses-you/optimize-sort.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@github-actions github-actions Bot added the SQL label Aug 2, 2022
@ulysses-you

Copy link
Copy Markdown
Contributor Author

cc @cloud-fan

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe some context, please? The current PR description is only describing improvements which is not qualified for legitimate backporting. I believe you have a reason to fix some bugs or regression, @ulysses-you and @cloud-fan .

no, only improve performance

@ulysses-you

Copy link
Copy Markdown
Contributor Author

@dongjoon-hyun yes, the story is we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment)

@dongjoon-hyun

Copy link
Copy Markdown
Member

Than you, @ulysses-you . Please put the explanation into the PR description.

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @huaxingao , too.

@ulysses-you

Copy link
Copy Markdown
Contributor Author

@dongjoon-hyun thank you, I have updated the description

@cloud-fan

Copy link
Copy Markdown
Contributor

@cloud-fan

Copy link
Copy Markdown
Contributor

merging to 3.3!

cloud-fan pushed a commit that referenced this pull request Aug 3, 2022
this is for backport #37330 into branch-3.3
### What changes were proposed in this pull request?

Optimize Global sort to RepartitionByExpression, for example:
```
Sort local             Sort local
  Sort global    =>      RepartitionByExpression
```

### Why are the changes needed?

If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort.

### Does this PR introduce _any_ user-facing change?

we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment)

### How was this patch tested?

add test

Closes #37330 from ulysses-you/optimize-sort.

Authored-by: ulysses-you <ulyssesyou18gmail.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>

Closes #37373 from ulysses-you/SPARK-39911-3.3.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this Aug 3, 2022
@ulysses-you ulysses-you deleted the SPARK-39911-3.3 branch August 3, 2022 03:14
@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants