Skip to content

[SPARK-35069][SQL] TRANSFORM forbidden DISTICNT and ALL, also make the error clear#32149

Closed
AngersZhuuuu wants to merge 3 commits into
apache:masterfrom
AngersZhuuuu:SPARK-28227-new-followup
Closed

[SPARK-35069][SQL] TRANSFORM forbidden DISTICNT and ALL, also make the error clear#32149
AngersZhuuuu wants to merge 3 commits into
apache:masterfrom
AngersZhuuuu:SPARK-28227-new-followup

Conversation

@AngersZhuuuu

@AngersZhuuuu AngersZhuuuu commented Apr 13, 2021

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

According to #29087 (comment), add UT in transform.sql

It seems that distinct is not recognized as a reserved word here

-- !query
explain extended SELECT TRANSFORM(distinct b, a, c)
                   USING 'cat' AS (a, b, c)
                 FROM script_trans
                 WHERE a <= 4
-- !query schema
struct<plan:string>
-- !query output
== Parsed Logical Plan ==
'ScriptTransformation [*], cat, [a#x, b#x, c#x], ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false)
+- 'Project ['distinct AS b#x, 'a, 'c]
   +- 'Filter ('a <= 4)
      +- 'UnresolvedRelation [script_trans], [], false

== Analyzed Logical Plan ==
org.apache.spark.sql.AnalysisException: cannot resolve 'distinct' given input columns: [script_trans.a, script_trans.b, script_trans.c]; line 1 pos 34;
'ScriptTransformation [*], cat, [a#x, b#x, c#x], ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false)
+- 'Project ['distinct AS b#x, a#x, c#x]
   +- Filter (a#x <= 4)
      +- SubqueryAlias script_trans
         +- View (`script_trans`, [a#x,b#x,c#x])
            +- Project [cast(a#x as int) AS a#x, cast(b#x as int) AS b#x, cast(c#x as int) AS c#x]
               +- Project [a#x, b#x, c#x]
                  +- SubqueryAlias script_trans
                     +- LocalRelation [a#x, b#x, c#x]

Hive's error
image

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added Ut

@AngersZhuuuu

Copy link
Copy Markdown
Contributor Author

FYI @cloud-fan

@SparkQA

SparkQA commented Apr 13, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41869/

@SparkQA

SparkQA commented Apr 13, 2021

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41869/



-- !query
SELECT TRANSFORM(distinct b, a, c)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah the parser treats it as distinct as b.

Can we improve the parser to recognize distinct correctly and give a clear error message?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why do we support TRANSFORM(a as b)? is it from Hive? Normal function parameters can't have alias.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why do we support TRANSFORM(a as b)? is it from Hive? Normal function parameters can't have alias.

Hive not support this.

image

may be we should not use

namedExpressionSeq
    : namedExpression (',' namedExpression)*
    ;

(SELECT kind=TRANSFORM '(' namedExpressionSeq ')'
            | kind=MAP namedExpressionSeq
            | kind=REDUCE namedExpressionSeq)

maybe

expressionSeq 
    : expression(',' expression) 
    ;

(SELECT kind=TRANSFORM '(' expressionSeq ')'
            | kind=MAP expressionSeq
            | kind=REDUCE eExpressionSeq)

@cloud-fan cloud-fan Apr 14, 2021

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, let's fix it as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, let's fix it as well.

Should I fix both issue in this pr or in separated PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have 2 PRs. forbid distinct and forbid alias are two things.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have 2 PRs. forbid distinct and forbid alias are two things.

I have make a new ticket for this one. and changed the title.

How about current

@SparkQA

SparkQA commented Apr 13, 2021

Copy link
Copy Markdown

Test build #137289 has finished for PR 32149 at commit 118f242.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions Bot added the SQL label Apr 13, 2021
@AngersZhuuuu AngersZhuuuu changed the title [SPARK-28227][SQL][FOLLOWUP] Add test case about transform expression with distinct [SPARK-31936][SQL][FOLLOWUP] TRANSFORM forbidden DISTICNT and ALL, also make the error clear Apr 14, 2021
Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala Outdated
Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala Outdated
@AngersZhuuuu AngersZhuuuu changed the title [SPARK-31936][SQL][FOLLOWUP] TRANSFORM forbidden DISTICNT and ALL, also make the error clear [SPARK-35069][SQL] TRANSFORM forbidden DISTICNT and ALL, also make the error clear Apr 14, 2021
@SparkQA

SparkQA commented Apr 14, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41919/

@SparkQA

SparkQA commented Apr 14, 2021

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41919/

@SparkQA

SparkQA commented Apr 14, 2021

Copy link
Copy Markdown

Test build #137340 has finished for PR 32149 at commit 542aa46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan

Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 4ca9958 Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants