[SPARK-9526][SQL]Utilize randomized tests to reveal potential bugs in sql expressions#7855
[SPARK-9526][SQL]Utilize randomized tests to reveal potential bugs in sql expressions#7855yjshen wants to merge 7 commits into
Conversation
|
Opening this early to get high level feed back ASAP. Note: The current merge build should fail due to
These bugs are not fixed yet since I just finished prototyping. |
|
For remainder, my hunch is that it's probably failing for extreme floating point values (e.g. take the remainder of a giant float by another giant float). I found a similar failure in #7625, an experimental branch of mine which contains some code for using reflection to write tests against all Expression subclasses. The code in my branch lags a bit behind what I have locally (e.g. it may be missing some of the interpreted vs. codegen comparison code) so I can see about pushing the rest of my changes later. The approach in my branch probably definitely isn't the right one for unit testing; it was more intended to be an experiment to see whether it would be possible to do this all via reflection. |
|
Test build #39363 has finished for PR 7855 at commit
|
|
@JoshRosen , thanks for the information about #7625, it's great! |
|
Test build #39409 has finished for PR 7855 at commit
|
|
Test build #39417 has finished for PR 7855 at commit
|
|
Did this end up finding any new bugs? |
|
All bugs revealed until now:
And I also fixed |
There was a problem hiding this comment.
put [[ ]] around CalendarIntervalType so IntelliJ can find it during refactoring
|
to help reviewing, and separate important fixes from nice to have tests, can you submit a separate pull request that includes all the bug fixes, along with deterministic unit tests that would trigger those cases? Then this pull request can be just about the randomized tests. |
|
Bugfixes were done in #7882, so this should be ready for rebasing. |
|
Ah, forgot the scaladoc on property check, will do now. |
|
This is on my review queue for tomorrow. |
|
Test build #39650 has finished for PR 7855 at commit
|
|
Test build #39676 has finished for PR 7855 at commit
|
|
Jenkins, retest this please. |
|
Test build #199 has finished for PR 7855 at commit
|
|
Unrelated failure again and again. |
|
Jenkins, retest this please. |
|
Test build #39696 has finished for PR 7855 at commit
|
|
Test build #203 has finished for PR 7855 at commit
|
|
Jenkins, retest this please. |
|
Test build #204 has finished for PR 7855 at commit
|
|
Test build #39702 has finished for PR 7855 at commit
|
There was a problem hiding this comment.
What do you think about giving this a more specific name, such as checkConsistencyBetweenInterpretedAndCodegen? It would also be good to add Scaladoc to these methods to explain what they're doing, since the use of reflection might be non-obvious.
There was a problem hiding this comment.
For instance, this method's Scaladoc could explain that it tests the expression's one-argument constructor with randomized literals of the given data type.
There was a problem hiding this comment.
Also, I think that we might be able to clean up the code slightly by adding a type to this method:
def checkConsistency[E <: Expression: ClassTag](dt: DataType)to let callers write something like
checkConsistencyBetweenInterpretedAndCodegen[Sinh](DoubleType)There was a problem hiding this comment.
Actually, do we even need reflection for this? Can we do something like https://github.com/yjshen/spark/blob/property_check/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala#L59 instead?
|
The basic approach here seems reasonable to me but I left a couple of comments regarding whether we need to use reflection and RE: some documentation / naming issues. |
|
Test build #40944 has finished for PR 7855 at commit
|
|
@JoshRosen , I've changed my implementation, do you mind review this again? |
|
LGTM pending Jenkins; thanks! |
|
Test build #1627 has finished for PR 7855 at commit
|
|
jenkins, retest this please. |
|
unrelated failure, org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-8368: includes jars passed in through --jars |
|
jenkins, retest this please. |
|
Test build #41004 has finished for PR 7855 at commit
|
|
@JoshRosen I will let you merge this one. |
|
Will merge provided that this still compiles. |
|
Jenkins, retest this please. |
|
Test build #41043 has finished for PR 7855 at commit
|
|
Alright, merging this to master and branch-1.5. Thanks! |
…in sql expressions JIRA: https://issues.apache.org/jira/browse/SPARK-9526 This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression. Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7855 from yjshen/property_check. (cherry picked from commit b265e28) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-9526
This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression.