Skip to content

[SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed comment in spark-sql#31033

Closed
turboFei wants to merge 2 commits into
apache:branch-3.0from
turboFei:SPARK-33100
Closed

[SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed comment in spark-sql#31033
turboFei wants to merge 2 commits into
apache:branch-3.0from
turboFei:SPARK-33100

Conversation

@turboFei

@turboFei turboFei commented Jan 5, 2021

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:

/* SELECT 'test'; */
SELECT 'test';

Would be split to two statements:
The first one: /* SELECT 'test'
The second one: */ SELECT 'test'

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

NOTE: This backport comes from #29982

Why are the changes needed?

Spark-sql might split the statements inside bracketed comments and it is not correct.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UT.

…park-sql

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

Spark-sql might split the statements inside bracketed comments and it is not correct.

No.

Added UT.

Closes #29982 from turboFei/SPARK-33110.

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@turboFei

turboFei commented Jan 5, 2021

Copy link
Copy Markdown
Member Author

For branch-2.4, this pr depends on below patches:
#25018

#27321

#28393

#27920

@maropu
Should I back port all these prs for branch-2.4?

FYI:
I have create one for branch-2.4

#31040

@maropu

maropu commented Jan 5, 2021

Copy link
Copy Markdown
Member

For branch-2.4, this pr depends on below patches:

Hmmm... I've checked it and the backport for branch-2.4 looks complicated, so I will backport the fix only for branch-3.0 for now. Anyway, thanks for the check, @turboFei

@maropu

maropu commented Jan 5, 2021

Copy link
Copy Markdown
Member

ok to test

@maropu maropu changed the title [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql [SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed comment in spark-sql Jan 5, 2021
@SparkQA

SparkQA commented Jan 5, 2021

Copy link
Copy Markdown

Test build #133683 has finished for PR 31033 at commit 3ee7483.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@turboFei

turboFei commented Jan 5, 2021

Copy link
Copy Markdown
Member Author

no space left on jenkins node:

[error] /home/jenkins/workspace/SparkPullRequestBuilder@2/sql/core/target/java/org/apache/spark/sql/LHMapClass$.java (No space left on device)

@SparkQA

SparkQA commented Jan 5, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38272/

@SparkQA

SparkQA commented Jan 5, 2021

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38272/

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this PR by including #31054 .

@dongjoon-hyun dongjoon-hyun marked this pull request as draft January 6, 2021 07:38
@dongjoon-hyun

Copy link
Copy Markdown
Member

Also, I converted this PR as Draft to prevent accidental merging.

@turboFei

turboFei commented Jan 6, 2021

Copy link
Copy Markdown
Member Author

got it,thanks

@maropu

maropu commented Jan 7, 2021

Copy link
Copy Markdown
Member

@turboFei Could you merge the flakiness fix into this PR? Thanks!

…in spark-sql

### What changes were proposed in this pull request?

This PR help find correct bound of bracketed comment in spark-sql.

Here is the log for UT of SPARK-33100 in CliSuite before:
```
2021-01-05 13:22:34.768 - stdout> spark-sql> /* SELECT 'test';*/ SELECT 'test';
2021-01-05 13:22:41.523 - stderr> Time taken: 6.716 seconds, Fetched 1 row(s)
2021-01-05 13:22:41.599 - stdout> test
2021-01-05 13:22:41.6 - stdout> spark-sql> ;;/* SELECT 'test';*/ SELECT 'test';
2021-01-05 13:22:41.709 - stdout> test
2021-01-05 13:22:41.709 - stdout> spark-sql> /* SELECT 'test';*/;; SELECT 'test';
2021-01-05 13:22:41.902 - stdout> spark-sql> SELECT 'test'; -- SELECT 'test';
2021-01-05 13:22:41.902 - stderr> Time taken: 0.129 seconds, Fetched 1 row(s)
2021-01-05 13:22:41.902 - stderr> Error in query:
2021-01-05 13:22:41.902 - stderr> mismatched input '<EOF>' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 19)
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> == SQL ==
2021-01-05 13:22:42.006 - stderr> /* SELECT 'test';*/
2021-01-05 13:22:42.006 - stderr> -------------------^^^
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> Time taken: 0.226 seconds, Fetched 1 row(s)
2021-01-05 13:22:42.006 - stdout> test
```
The root cause is that the insideBracketedComment is not accurate.

For `/* comment */`, the last character `/` is not insideBracketedComment and it would be treat as beginning of statements.

In this PR, this issue is fixed.

### Why are the changes needed?
To fix the issue described above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT

Closes #31054 from turboFei/SPARK-33100-followup.

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@turboFei turboFei marked this pull request as ready for review January 7, 2021 12:20
@turboFei turboFei requested a review from dongjoon-hyun January 7, 2021 12:20
@turboFei

turboFei commented Jan 7, 2021

Copy link
Copy Markdown
Member Author

thanks, have merged the followup PR

@SparkQA

SparkQA commented Jan 7, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38385/

@SparkQA

SparkQA commented Jan 7, 2021

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38385/

@SparkQA

SparkQA commented Jan 7, 2021

Copy link
Copy Markdown

Test build #133797 has finished for PR 31033 at commit a15f125.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

maropu pushed a commit that referenced this pull request Jan 8, 2021
… in spark-sql

### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

NOTE: This backport comes from #29982

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #31033 from turboFei/SPARK-33100.

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu

maropu commented Jan 8, 2021

Copy link
Copy Markdown
Member

Thanks! Merged to branch-3.0.

@maropu maropu closed this Jan 8, 2021
@turboFei turboFei deleted the SPARK-33100 branch January 8, 2021 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants