Skip to content

[SPARK-9526][SQL]Utilize randomized tests to reveal potential bugs in sql expressions#7855

Closed
yjshen wants to merge 7 commits into
apache:masterfrom
yjshen:property_check
Closed

[SPARK-9526][SQL]Utilize randomized tests to reveal potential bugs in sql expressions#7855
yjshen wants to merge 7 commits into
apache:masterfrom
yjshen:property_check

Conversation

@yjshen

@yjshen yjshen commented Aug 1, 2015

Copy link
Copy Markdown
Member

JIRA: https://issues.apache.org/jira/browse/SPARK-9526

This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression.

@yjshen

yjshen commented Aug 1, 2015

Copy link
Copy Markdown
Member Author

Opening this early to get high level feed back ASAP.

Note: The current merge build should fail due to three two bugs:

  1. UnaryMinus's codegen version would fail to compile when the input is Long.MinValue
  2. Remainder would fail due to codegen and interpret mode returning different result for same input.
  3. MaxOf/MinOf would fail due to ClassCastException: BinaryType's ordering need Array[Byte] as input but GenericArrayData is given. Not a problem

These bugs are not fixed yet since I just finished prototyping.

@yjshen

yjshen commented Aug 1, 2015

Copy link
Copy Markdown
Member Author

cc @rxin @davies

@JoshRosen

Copy link
Copy Markdown
Contributor

For remainder, my hunch is that it's probably failing for extreme floating point values (e.g. take the remainder of a giant float by another giant float). I found a similar failure in #7625, an experimental branch of mine which contains some code for using reflection to write tests against all Expression subclasses.

The code in my branch lags a bit behind what I have locally (e.g. it may be missing some of the interpreted vs. codegen comparison code) so I can see about pushing the rest of my changes later. The approach in my branch probably definitely isn't the right one for unit testing; it was more intended to be an experiment to see whether it would be possible to do this all via reflection.

@SparkQA

SparkQA commented Aug 1, 2015

Copy link
Copy Markdown

Test build #39363 has finished for PR 7855 at commit daffd80.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StopWordsRemover(override val uid: String)

@yjshen

yjshen commented Aug 2, 2015

Copy link
Copy Markdown
Member Author

@JoshRosen , thanks for the information about #7625, it's great!
I'll read that in detail and see how I can refine my implementation accordingly.

@SparkQA

SparkQA commented Aug 2, 2015

Copy link
Copy Markdown

Test build #39409 has finished for PR 7855 at commit e3bbe4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class RequestExecutors(appId: String, requestedTotal: Int)
    • case class KillExecutors(appId: String, executorIds: Seq[String])
    • class SpecificSafeProjection extends $
    • case class FromUTCTimestamp(left: Expression, right: Expression)
    • case class ToUTCTimestamp(left: Expression, right: Expression)
    • case class DateDiff(endDate: Expression, startDate: Expression)
    • case class InitCap(child: Expression) extends UnaryExpression with ImplicitCastInputTypes

@yjshen yjshen changed the title [SPARK-9526][SQL][WIP] Utilize randomized tests to reveal potential bugs in sql expressions [SPARK-9526][SQL] Utilize randomized tests to reveal potential bugs in sql expressions Aug 2, 2015
@SparkQA

SparkQA commented Aug 2, 2015

Copy link
Copy Markdown

Test build #39417 has finished for PR 7855 at commit 42769b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class RequestExecutors(appId: String, requestedTotal: Int)
    • case class KillExecutors(appId: String, executorIds: Seq[String])
    • class SpecificSafeProjection extends $
    • case class FromUTCTimestamp(left: Expression, right: Expression)
    • case class ToUTCTimestamp(left: Expression, right: Expression)
    • case class DateDiff(endDate: Expression, startDate: Expression)
    • case class InitCap(child: Expression) extends UnaryExpression with ImplicitCastInputTypes

@JoshRosen

Copy link
Copy Markdown
Contributor

Did this end up finding any new bugs?

@yjshen

yjshen commented Aug 3, 2015

Copy link
Copy Markdown
Member Author

All bugs revealed until now:

  1. UnaryMinus's codegen version would fail to compile when the input is Long.MinValue
  2. Remainder would fail due to codegen and interpret mode returning different result for same input. (yes, for remainding between giant values)
  3. BinaryComparison would fail to compile in codegen mode when comparing Boolean types.
  4. AddMonth would fail if passed a huge negative month, which would lead accessing negative index of monthDays array.

And I also fixed Nanvl by upcasting its operand if the are of different type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put [[ ]] around CalendarIntervalType so IntelliJ can find it during refactoring

@rxin

rxin commented Aug 3, 2015

Copy link
Copy Markdown
Contributor

@yjshen

to help reviewing, and separate important fixes from nice to have tests, can you submit a separate pull request that includes all the bug fixes, along with deterministic unit tests that would trigger those cases?

Then this pull request can be just about the randomized tests.

@JoshRosen

Copy link
Copy Markdown
Contributor

Bugfixes were done in #7882, so this should be ready for rebasing.

@yjshen

yjshen commented Aug 4, 2015

Copy link
Copy Markdown
Member Author

Ah, forgot the scaladoc on property check, will do now.

@JoshRosen

Copy link
Copy Markdown
Contributor

This is on my review queue for tomorrow.

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #39650 has finished for PR 7855 at commit b2c6543.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class SortedIterator extends UnsafeSorterIterator
    • public class KVSorterIterator extends KVIterator<UnsafeRow, UnsafeRow>

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #39676 has finished for PR 7855 at commit 5301891.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen

yjshen commented Aug 4, 2015

Copy link
Copy Markdown
Member Author

Jenkins, retest this please.

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #199 has finished for PR 7855 at commit 5301891.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen

yjshen commented Aug 4, 2015

Copy link
Copy Markdown
Member Author

Unrelated failure again and again.
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.(It is not a test)
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.(It is not a test)

@yjshen

yjshen commented Aug 4, 2015

Copy link
Copy Markdown
Member Author

Jenkins, retest this please.

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #39696 has finished for PR 7855 at commit 5301891.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #203 has finished for PR 7855 at commit 5301891.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen

yjshen commented Aug 4, 2015

Copy link
Copy Markdown
Member Author

Jenkins, retest this please.

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #204 has finished for PR 7855 at commit 5301891.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #39702 has finished for PR 7855 at commit 5301891.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about giving this a more specific name, such as checkConsistencyBetweenInterpretedAndCodegen? It would also be good to add Scaladoc to these methods to explain what they're doing, since the use of reflection might be non-obvious.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, this method's Scaladoc could explain that it tests the expression's one-argument constructor with randomized literals of the given data type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think that we might be able to clean up the code slightly by adding a type to this method:

def checkConsistency[E <: Expression: ClassTag](dt: DataType)

to let callers write something like

checkConsistencyBetweenInterpretedAndCodegen[Sinh](DoubleType)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen

Copy link
Copy Markdown
Contributor

The basic approach here seems reasonable to me but I left a couple of comments regarding whether we need to use reflection and RE: some documentation / naming issues.

@SparkQA

SparkQA commented Aug 15, 2015

Copy link
Copy Markdown

Test build #40944 has finished for PR 7855 at commit 0a5bdc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
    • abstract class LocalNode extends TreeNode[LocalNode]
    • abstract class LeafLocalNode extends LocalNode
    • abstract class UnaryLocalNode extends LocalNode
    • case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
    • case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode

@yjshen

yjshen commented Aug 15, 2015

Copy link
Copy Markdown
Member Author

@JoshRosen , I've changed my implementation, do you mind review this again?

@JoshRosen

Copy link
Copy Markdown
Contributor

LGTM pending Jenkins; thanks!

@SparkQA

SparkQA commented Aug 16, 2015

Copy link
Copy Markdown

Test build #1627 has finished for PR 7855 at commit 0a5bdc9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
    • abstract class LocalNode extends TreeNode[LocalNode]
    • abstract class LeafLocalNode extends LocalNode
    • abstract class UnaryLocalNode extends LocalNode
    • case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
    • case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode

@yjshen

yjshen commented Aug 17, 2015

Copy link
Copy Markdown
Member Author

jenkins, retest this please.

@yjshen

yjshen commented Aug 17, 2015

Copy link
Copy Markdown
Member Author

unrelated failure, org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-8368: includes jars passed in through --jars

@yjshen

yjshen commented Aug 17, 2015

Copy link
Copy Markdown
Member Author

jenkins, retest this please.

@SparkQA

SparkQA commented Aug 17, 2015

Copy link
Copy Markdown

Test build #41004 has finished for PR 7855 at commit 0a5bdc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks

@rxin

rxin commented Aug 17, 2015

Copy link
Copy Markdown
Contributor

@JoshRosen I will let you merge this one.

@JoshRosen

Copy link
Copy Markdown
Contributor

Will merge provided that this still compiles.

@JoshRosen

Copy link
Copy Markdown
Contributor

Jenkins, retest this please.

@SparkQA

SparkQA commented Aug 17, 2015

Copy link
Copy Markdown

Test build #41043 has finished for PR 7855 at commit 0a5bdc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks

@JoshRosen

Copy link
Copy Markdown
Contributor

Alright, merging this to master and branch-1.5. Thanks!

asfgit pushed a commit that referenced this pull request Aug 17, 2015
…in sql expressions

JIRA: https://issues.apache.org/jira/browse/SPARK-9526

This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression.

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #7855 from yjshen/property_check.

(cherry picked from commit b265e28)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
@asfgit asfgit closed this in b265e28 Aug 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants