Skip to content

[SPARK-48817][SQL] Eagerly execute union multi commands together#47224

Closed
wForget wants to merge 2 commits into
apache:masterfrom
wForget:SPARK-48817
Closed

[SPARK-48817][SQL] Eagerly execute union multi commands together#47224
wForget wants to merge 2 commits into
apache:masterfrom
wForget:SPARK-48817

Conversation

@wForget

@wForget wForget commented Jul 5, 2024

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Eagerly execute union multi commands together.

Why are the changes needed?

MultiInsert is split to multiple sql executions, resulting in no exchange reuse.

Reproduce sql:

create table wangzhen_t1(c1 int);
create table wangzhen_t2(c1 int);
create table wangzhen_t3(c1 int);
insert into wangzhen_t1 values (1), (2), (3);

from (select /*+ REPARTITION(3) */ c1 from wangzhen_t1)
insert overwrite table wangzhen_t2 select c1
insert overwrite table wangzhen_t3 select c1; 

In Spark 3.1, there is only one SQL execution and there is a reuse exchange.

image

However, in Spark 3.5, it was split to multiple executions and there was no ReuseExchange.

image
image

Does this PR introduce any user-facing change?

yes, multi inserts will executed in one execution.

How was this patch tested?

added unit test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions Bot added the SQL label Jul 5, 2024
@wForget

wForget commented Jul 5, 2024

Copy link
Copy Markdown
Member Author

It seems to be caused by #32513

@wForget

wForget commented Jul 5, 2024

Copy link
Copy Markdown
Member Author

@cloud-fan @beliefer Could you please take a look?

@ulysses-you ulysses-you left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except some minor comments

Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated
@ulysses-you

Copy link
Copy Markdown
Contributor

thanks, merged to master

@cloud-fan

Copy link
Copy Markdown
Contributor

late LGTM

jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?

Eagerly execute union multi commands together.

### Why are the changes needed?
MultiInsert is split to multiple sql executions, resulting in no exchange reuse.

Reproduce sql:

```
create table wangzhen_t1(c1 int);
create table wangzhen_t2(c1 int);
create table wangzhen_t3(c1 int);
insert into wangzhen_t1 values (1), (2), (3);

from (select /*+ REPARTITION(3) */ c1 from wangzhen_t1)
insert overwrite table wangzhen_t2 select c1
insert overwrite table wangzhen_t3 select c1;
```

In Spark 3.1, there is only one SQL execution and there is a reuse exchange.

![image](https://github.com/apache/spark/assets/17894939/5ff68392-aaa8-4e6b-8cac-1687880796b9)

However, in Spark 3.5, it was split to multiple executions and there was no ReuseExchange.

![image](https://github.com/apache/spark/assets/17894939/afdb14b6-5007-4923-802d-535149974ecf)
![image](https://github.com/apache/spark/assets/17894939/0d60e8db-9da7-4906-8d07-2b622b55e6ab)

### Does this PR introduce _any_ user-facing change?

yes,  multi  inserts will executed in one execution.

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47224 from wForget/SPARK-48817.

Authored-by: wforget <643348094@qq.com>
Signed-off-by: youxiduo <youxiduo@corp.netease.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants