Skip to content

[SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql#29982

Closed
turboFei wants to merge 31 commits into
apache:masterfrom
turboFei:SPARK-33110
Closed

[SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql#29982
turboFei wants to merge 31 commits into
apache:masterfrom
turboFei:SPARK-33110

Conversation

@turboFei

@turboFei turboFei commented Oct 9, 2020

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:

/* SELECT 'test'; */
SELECT 'test';

Would be split to two statements:
The first one: /* SELECT 'test'
The second one: */ SELECT 'test'

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

Why are the changes needed?

Spark-sql might split the statements inside bracketed comments and it is not correct.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UT.

@turboFei turboFei changed the title [SPARK-33110][SQL] Support parse the sql statements with c-style comments [SPARK-33100][SQL] Support parse the sql statements with c-style comments Oct 9, 2020
@turboFei turboFei changed the title [SPARK-33100][SQL] Support parse the sql statements with c-style comments [SPARK-33100][SQL] Support parse the sql statements with C-style comments Oct 9, 2020
@maropu

maropu commented Oct 9, 2020

Copy link
Copy Markdown
Member

If a comment /* ... */ includes ;. it will throw an exception? The comment style is already supported, so the title and the description look confusing. I think this is just a bug.

@maropu

maropu commented Oct 9, 2020

Copy link
Copy Markdown
Member

ok to test

@SparkQA

SparkQA commented Oct 9, 2020

Copy link
Copy Markdown

Test build #129585 has finished for PR 29982 at commit 084fc37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 9, 2020

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34190/

@SparkQA

SparkQA commented Oct 9, 2020

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34190/

@turboFei turboFei changed the title [SPARK-33100][SQL] Support parse the sql statements with C-style comments [SPARK-33100][SQL] Fix issue when parsing the sql statements with C-style comments Oct 9, 2020
@turboFei turboFei changed the title [SPARK-33100][SQL] Fix issue when parsing the sql statements with C-style comments [SPARK-33100][SQL] Fix the issue when parsing the sql statements with C-style comments Oct 9, 2020
@turboFei

turboFei commented Oct 9, 2020

Copy link
Copy Markdown
Member Author

@maropu Thanks for your comments, I have modified the title.

@turboFei turboFei changed the title [SPARK-33100][SQL] Fix the issue when parsing the sql statements with C-style comments [SPARK-33100][SQL] Fix the issue when parsing sql statements with C-style comments Oct 9, 2020
@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34219/

@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Test build #129618 has finished for PR 29982 at commit 23437dd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Test build #129616 has finished for PR 29982 at commit 6b225ec.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34222/

@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34219/

@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34222/

@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Test build #129621 has finished for PR 29982 at commit bc1bb37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34225/

@SparkQA

SparkQA commented Oct 10, 2020

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34225/

@maropu

maropu commented Oct 10, 2020

Copy link
Copy Markdown
Member

cc: @wangyum @yaooqinn

@turboFei turboFei changed the title [SPARK-33100][SQL] Fix the issue when parsing sql statements with C-style comments [SPARK-33100][SQL] Fix the issue when parsing sql statements with bracketed comments Oct 10, 2020
@turboFei

turboFei commented Jan 4, 2021

Copy link
Copy Markdown
Member Author

seems jenkins job not trigged, make a new commit to remove extended

@turboFei

turboFei commented Jan 4, 2021

Copy link
Copy Markdown
Member Author

seems no jenkins job triggered, I have passed the UT locally.
gentle ping @maropu

@maropu

maropu commented Jan 4, 2021

Copy link
Copy Markdown
Member

Thanks for fixing this, @turboFei. Looks fine cc: @HyukjinKwon @yaooqinn @wangyum

@SparkQA

SparkQA commented Jan 5, 2021

Copy link
Copy Markdown

Test build #133631 has finished for PR 29982 at commit f7f8030.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Jan 5, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38220/

@SparkQA

SparkQA commented Jan 5, 2021

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38220/

@maropu maropu closed this in a071826 Jan 5, 2021
maropu pushed a commit that referenced this pull request Jan 5, 2021
…park-sql

### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #29982 from turboFei/SPARK-33110.

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit a071826)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu

maropu commented Jan 5, 2021

Copy link
Copy Markdown
Member

Many thanks, @turboFei and @yaooqinn ! Merged to master/3.1. FYI: @dongjoon-hyun @HyukjinKwon

@maropu

maropu commented Jan 5, 2021

Copy link
Copy Markdown
Member

@turboFei Could you open a PR to fix it for branch-3.0/2.4?

@turboFei

turboFei commented Jan 5, 2021

Copy link
Copy Markdown
Member Author

@turboFei Could you open a PR to fix it for branch-3.0/2.4?

sure

@maropu

maropu commented Jan 5, 2021

Copy link
Copy Markdown
Member

@turboFei I found some GA flakiness caused by this commit, e.g.,
https://github.com/apache/spark/pull/31045/checks?check_run_id=1652972350
https://github.com/apache/spark/runs/1652975825?check_suite_focus=true

Could you check/fix it? FYI: @dongjoon-hyun @HyukjinKwon

@turboFei

turboFei commented Jan 6, 2021

Copy link
Copy Markdown
Member Author

oh, i will fix it today.

Should we ignore the comments during two ;?

for example,

/* comment*/; select * from test;

should be transfered asselect * from test;

Or I just remove the test case like that?

@maropu

maropu commented Jan 6, 2021

Copy link
Copy Markdown
Member

What's a root cause of the flakiness? It depends on the cause, I think.

@turboFei

turboFei commented Jan 6, 2021

Copy link
Copy Markdown
Member Author

These two statements only return one result.

Might the first statement contains an invalid statement /* SELECT 'test';*/ and does not return result.

image

@gatorsmile

Copy link
Copy Markdown
Member

CC @bogdanghit

@turboFei

turboFei commented Jan 6, 2021

Copy link
Copy Markdown
Member Author

there is a bug for statementBegin method.
For /* SELECT 'test';*/, the last character / would be treated as beginning of statements

@turboFei

turboFei commented Jan 6, 2021

Copy link
Copy Markdown
Member Author

create #31054 to fix this issue

maropu pushed a commit that referenced this pull request Jan 8, 2021
… in spark-sql

### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

NOTE: This backport comes from #29982

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #31033 from turboFei/SPARK-33100.

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
cloud-fan pushed a commit that referenced this pull request Dec 7, 2021
…kend

### What changes were proposed in this pull request?
In current spark-sql cli interface, if the end SQL is  not a close comment, the SQL won't be passed to backend engine and just ignored. This caused a problem that if user write a SQL with wrong comment. It's just ignored and won't throw exception.
For example:
```
spark-sql> /* This is a comment without end symbol SELECT 1;
spark-sql>
```

After this pr:
```
spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^
```

In SPARK-33100 add this change #29982

Hive related code
https://github.com/apache/hive/blob/1090c93b1a02d480bdee2af2cecf503f8a54efc6/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L488-L490

### Why are the changes needed?
Exact exceptions are thrown for wrong statements, which is convenient for users to troubleshoot.

### Does this PR introduce _any_ user-facing change?
Yes, if user write a wrong comment in sql/sql file or query in the end.
Before it's just ignored since it's not a statement. Now it will be passed to backend engine and if the statement is not correct, it will throw SQL exception.

### How was this patch tested?
added UT and  test by handle.

```
spark-sql> /* SELECT /*+ HINT() 4; */;
Error in query:
mismatched input ';' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26)

== SQL ==
/* SELECT /*+ HINT() 4; */;
--------------------------^^^

spark-sql> /* SELECT /*+ HINT() 4; */
         >         SELECT 1;
1
Time taken: 3.16 seconds, Fetched 1 row(s)
spark-sql> /* SELECT /*+ HINT() */ 4; */;
spark-sql>
         > ;
spark-sql>
         > /* SELECT /*+ HINT() 4\\;
         >         SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

== SQL ==
/* SELECT /*+ HINT() 4\\;
^^^
        SELECT 1;

spark-sql>
```

Closes #34815 from AngersZhuuuu/SPARK-37555.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants