Skip to content

SPARK-2407: Added internal implementation of SQL SUBSTR()#1359

Closed
willb wants to merge 4 commits into
apache:masterfrom
willb:internalSqlSubstring
Closed

SPARK-2407: Added internal implementation of SQL SUBSTR()#1359
willb wants to merge 4 commits into
apache:masterfrom
willb:internalSqlSubstring

Conversation

@willb

@willb willb commented Jul 10, 2014

Copy link
Copy Markdown
Contributor

This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst
and adds tests to verify correct operation.

@AmplabJenkins

Copy link
Copy Markdown

Merged build started.

@AmplabJenkins

Copy link
Copy Markdown

Merged build triggered.

@SparkQA

SparkQA commented Jul 10, 2014

Copy link
Copy Markdown

QA tests have started for PR 1359. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16506/consoleFull

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Will -- I think we need to check if resolved is true here to not break Catalyst's contract. Similar to here.

@concretevitamin

Copy link
Copy Markdown
Contributor

Hey @willb - thanks for working on this, which is going to be very useful for Spark SQL! I left a couple minor comments. Another general concern is the performance of eval(). If there are ways to reduce branchings, or reduce the function call (not sure if @inline will help), we should probably do it.

@willb

willb commented Jul 10, 2014

Copy link
Copy Markdown
Contributor Author

Thanks for the comments, @concretevitamin. I've made the changes from your comments and will think about reducing branch overhead before pushing an update.

@willb

willb commented Jul 10, 2014

Copy link
Copy Markdown
Contributor Author

@concretevitamin Inlining Substring.slice seems to make a nontrivial difference (~11.5% speedup) on a very simple Substring.eval() microbenchmark.

@concretevitamin

Copy link
Copy Markdown
Contributor

This sounds pretty cool!

@SparkQA

SparkQA commented Jul 10, 2014

Copy link
Copy Markdown

QA results for PR 1359:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
case class Substring(str: Expression, pos: Expression, len: Expression) extends Expression {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16506/consoleFull

@AmplabJenkins

Copy link
Copy Markdown

Merged build finished. All automated tests passed.

@AmplabJenkins

Copy link
Copy Markdown

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16506/

@AmplabJenkins

Copy link
Copy Markdown

Merged build triggered.

@AmplabJenkins

Copy link
Copy Markdown

Merged build started.

@SparkQA

SparkQA commented Jul 10, 2014

Copy link
Copy Markdown

QA tests have started for PR 1359. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16513/consoleFull

@SparkQA

SparkQA commented Jul 10, 2014

Copy link
Copy Markdown

QA results for PR 1359:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
case class Substring(str: Expression, pos: Expression, len: Expression) extends Expression {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16513/consoleFull

@AmplabJenkins

Copy link
Copy Markdown

Merged build finished. All automated tests passed.

@AmplabJenkins

Copy link
Copy Markdown

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16513/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to do this without view bounds? I think those are going to disappear soon: https://issues.scala-lang.org/browse/SI-7629

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I can do it with implicit parameters. I'll push an update once I've run the Catalyst suite against my change.

@SparkQA

SparkQA commented Jul 11, 2014

Copy link
Copy Markdown

QA tests have started for PR 1359. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16543/consoleFull

@SparkQA

SparkQA commented Jul 11, 2014

Copy link
Copy Markdown

QA results for PR 1359:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
case class Substring(str: Expression, pos: Expression, len: Expression) extends Expression {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16543/consoleFull

@SparkQA

SparkQA commented Jul 11, 2014

Copy link
Copy Markdown

QA tests have started for PR 1359. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16544/consoleFull

@SparkQA

SparkQA commented Jul 11, 2014

Copy link
Copy Markdown

QA results for PR 1359:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
case class Substring(str: Expression, pos: Expression, len: Expression) extends Expression {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16544/consoleFull

@marmbrus

Copy link
Copy Markdown
Contributor

Hey Will, this looks great. Thanks for implementing! Mind fixing the conflict with master so I can merge?

willb added 2 commits July 14, 2014 17:55
This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst
and adds tests to verify correct operation.

Squashes 8969d038 and e6419b4e.
* orders imports in stringOperations.scala
* Substring.dataType throws exception if children are unresolved
* inlines Substring.slice (~11.5% performance improvement on microbenchmark runs)
* adds a special `toString` case for two-argument SUBSTR expressions
* removes spurious I_ prefix to SUBSTR(ING) in HiveQL.scala

Thanks to @concretevitamin for prompt and useful feedback!
@marmbrus

Copy link
Copy Markdown
Contributor

add to whitelist

@SparkQA

SparkQA commented Jul 15, 2014

Copy link
Copy Markdown

QA tests have started for PR 1359. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16652/consoleFull

@SparkQA

SparkQA commented Jul 15, 2014

Copy link
Copy Markdown

QA results for PR 1359:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
case class Substring(str: Expression, pos: Expression, len: Expression) extends Expression {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16652/consoleFull

@marmbrus

Copy link
Copy Markdown
Contributor

I'm going to go ahead and merge this. Someone can make a follow-up with nullability narrowing.

@marmbrus

Copy link
Copy Markdown
Contributor

Thanks! Merged into master and 1.0.

@asfgit asfgit closed this in 61de65b Jul 15, 2014
asfgit pushed a commit that referenced this pull request Jul 15, 2014
This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst
and adds tests to verify correct operation.

Author: William Benton <willb@redhat.com>

Closes #1359 from willb/internalSqlSubstring and squashes the following commits:

ccedc47 [William Benton] Fixed too-long line.
a30a037 [William Benton] replace view bounds with implicit parameters
ec35c80 [William Benton] Adds fixes from review:
4f3bfdb [William Benton] Added internal implementation of SQL SUBSTR()

(cherry picked from commit 61de65b)
Signed-off-by: Michael Armbrust <michael@databricks.com>
asfgit pushed a commit that referenced this pull request Jul 16, 2014
This is a follow-up of #1359 with nullability narrowing.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1426 from ueshin/issues/SPARK-2504 and squashes the following commits:

5157832 [Takuya UESHIN] Remove unnecessary white spaces.
80958ac [Takuya UESHIN] Fix nullability of Substring expression.

(cherry picked from commit 632fb3d)
Signed-off-by: Reynold Xin <rxin@apache.org>
asfgit pushed a commit that referenced this pull request Jul 16, 2014
This is a follow-up of #1359 with nullability narrowing.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1426 from ueshin/issues/SPARK-2504 and squashes the following commits:

5157832 [Takuya UESHIN] Remove unnecessary white spaces.
80958ac [Takuya UESHIN] Fix nullability of Substring expression.
@chutium

chutium commented Jul 16, 2014

Copy link
Copy Markdown
Contributor

hi, it is really very useful for us, i tried this implementation from @willb , in spark-shell, i still got java.lang.UnsupportedOperationException by Query Plan, i made some change in SqlParser: chutium@1de83a7

@marmbrus

Copy link
Copy Markdown
Contributor

Awesome! Please submit a pull request with that addition.
On Jul 16, 2014 7:53 AM, "Teng Qiu" notifications@github.com wrote:

hi, it is really very useful for us, i tried this implementation from
@willb https://github.com/willb , in spark-shell, i still got
java.lang.UnsupportedOperationException by Query Plan, i made some change
in SqlParser: chutium/spark@1de83a7
chutium@1de83a7


Reply to this email directly or view it on GitHub
#1359 (comment).

@chutium

chutium commented Jul 16, 2014

Copy link
Copy Markdown
Contributor

PR submitted #1442

asfgit pushed a commit that referenced this pull request Jul 19, 2014
follow-up of #1359

Author: chutium <teng.qiu@gmail.com>

Closes #1442 from chutium/master and squashes the following commits:

b49cc8a [chutium] SPARK-2407: Added Parser of SQL SUBSTRING() #1442
9a60ccf [chutium] SPARK-2407: Added Parser of SQL SUBSTR() #1442
06e933b [chutium] Merge https://github.com/apache/spark
c870172 [chutium] Merge https://github.com/apache/spark
094f773 [chutium] Merge https://github.com/apache/spark
88cb37d [chutium] Merge https://github.com/apache/spark
1de83a7 [chutium] SPARK-2407: Added Parse of SQL SUBSTR()
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst
and adds tests to verify correct operation.

Author: William Benton <willb@redhat.com>

Closes apache#1359 from willb/internalSqlSubstring and squashes the following commits:

ccedc47 [William Benton] Fixed too-long line.
a30a037 [William Benton] replace view bounds with implicit parameters
ec35c80 [William Benton] Adds fixes from review:
4f3bfdb [William Benton] Added internal implementation of SQL SUBSTR()
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This is a follow-up of apache#1359 with nullability narrowing.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes apache#1426 from ueshin/issues/SPARK-2504 and squashes the following commits:

5157832 [Takuya UESHIN] Remove unnecessary white spaces.
80958ac [Takuya UESHIN] Fix nullability of Substring expression.
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
follow-up of apache#1359

Author: chutium <teng.qiu@gmail.com>

Closes apache#1442 from chutium/master and squashes the following commits:

b49cc8a [chutium] SPARK-2407: Added Parser of SQL SUBSTRING() apache#1442
9a60ccf [chutium] SPARK-2407: Added Parser of SQL SUBSTR() apache#1442
06e933b [chutium] Merge https://github.com/apache/spark
c870172 [chutium] Merge https://github.com/apache/spark
094f773 [chutium] Merge https://github.com/apache/spark
88cb37d [chutium] Merge https://github.com/apache/spark
1de83a7 [chutium] SPARK-2407: Added Parse of SQL SUBSTR()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants