Skip to content

[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double#29515

Closed
tanelk wants to merge 8 commits into
apache:masterfrom
tanelk:SPARK-32688
Closed

[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double#29515
tanelk wants to merge 8 commits into
apache:masterfrom
tanelk:SPARK-32688

Conversation

@tanelk

@tanelk tanelk commented Aug 22, 2020

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

The LiteralGenerator for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the Gen.chooseNum method does not yield values that are outside the defined range. The Gen.chooseNum for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class RandomDataGenerator that is used in some other tests. Added -0.0 and -0.0f as special values to there too.

These changes revealed an inconsistency with the equality check between -0.0 and 0.0.

Why are the changes needed?

The LiteralGenerator is mostly used in the checkConsistencyBetweenInterpretedAndCodegen method in MathExpressionsSuite. This change would have caught the bug fixed in #29495 .

Does this PR introduce any user-facing change?

No

How was this patch tested?

Locally reverted #29495 and verified that the existing test cases caught the bug.

@tanelk

tanelk commented Aug 22, 2020

Copy link
Copy Markdown
Contributor Author

cc @cloud-fan , this should be relevant to you

@maropu maropu changed the title [SPARK-32688][TESTS] Add special values to LiteralGenerator for float and double [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 22, 2020
@maropu

maropu commented Aug 22, 2020

Copy link
Copy Markdown
Member

ok to test

f <- Gen.oneOf(
Gen.oneOf(
Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue,
0.0f, -0.0f, 1.0f, -1.0f),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are 1.0f and -1.0f also special values?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't in the sense, that Arbitrary.arbFloat.arbitrary can generate them, but they are in the sense, that it is more likely, that a function could act weirdly at these values. For example log1p.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave some comments in the code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@maropu

maropu commented Aug 22, 2020

Copy link
Copy Markdown
Member

also cc: @srowen

@SparkQA

SparkQA commented Aug 23, 2020

Copy link
Copy Markdown

Test build #127791 has finished for PR 29515 at commit 8c8313c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk

tanelk commented Aug 23, 2020

Copy link
Copy Markdown
Contributor Author

Test build #127791 has finished for PR 29515 at commit 8c8313c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

This failure is discovered by this change, not caused. Should that be fixed by a separate pull request? Not sure which of the two is the correct behavior.
There could be more like this, but due to the random nature of generated values it might not show every build.

@srowen

srowen commented Aug 23, 2020

Copy link
Copy Markdown
Member

I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed.

@tanelk

tanelk commented Aug 23, 2020

Copy link
Copy Markdown
Contributor Author

retest this please

@tanelk

tanelk commented Aug 23, 2020

Copy link
Copy Markdown
Contributor Author

I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed.

I meant, that it could be fixed in another PR, before this PR is merged

@SparkQA

SparkQA commented Aug 23, 2020

Copy link
Copy Markdown

Test build #127794 has finished for PR 29515 at commit 77d63ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu

maropu commented Aug 23, 2020

Copy link
Copy Markdown
Member

I meant, that it could be fixed in another PR, before this PR is merged

Could you file jira and fix it?

@tanelk

tanelk commented Aug 23, 2020

Copy link
Copy Markdown
Contributor Author

There is a related issue with -0.0: https://issues.apache.org/jira/browse/SPARK-32110

// +---------+---------+---------+---------+
protected override def nullSafeEval(left: Any, right: Any): Any = ordering.equiv(left, right)
protected override def nullSafeEval(left: Any, right: Any): Any = {
left == right || ordering.equiv(left, right)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, we shoud fix the existing test failures that we found in this PR. But, this fix looks improper, so could we use NormalizeNaNAndZero instead? cc: @cloud-fan @viirya

@tanelk tanelk Aug 23, 2020

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeNaNAndZero can't help here, because checkConsistencyBetweenInterpretedAndCodegen is done without optimizers.

Also it could introduce new correctness issues with atan2(-0.0, x) and 1.0 / -0.0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that 0.0 == -0.0 is the expected behavior, but if it is not, then we could leave this as it was change the code gen path.

@maropu maropu Aug 23, 2020

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, how about the case array(-0.0) == array(0.0)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are 100% correct.

It is an interesting problem, where the same comparator is used for both sorting and equality check.
For sorting -0.0 should be smaller than 0.0, but in equality check they should be equal.

Just for reference, it seems that both hive and mysql consider them equal in the equality check:
https://issues.apache.org/jira/browse/HIVE-11174

@tanelk tanelk changed the title [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double [WIP][SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 23, 2020
@SparkQA

SparkQA commented Aug 23, 2020

Copy link
Copy Markdown

Test build #127808 has finished for PR 29515 at commit 1116d3d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 23, 2020

Copy link
Copy Markdown

Test build #127807 has finished for PR 29515 at commit 818abe2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 24, 2020

Copy link
Copy Markdown

Test build #127816 has finished for PR 29515 at commit 2dce36e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk

tanelk commented Aug 24, 2020

Copy link
Copy Markdown
Contributor Author

There is a org.apache.spark.sql.RandomDataGenerator, that does pretty much the same thing as the LiteralGenerator. Perhaps they should be unified?

@cloud-fan

Copy link
Copy Markdown
Contributor

This is a good catch! Let's fix the found bugs one by one and merge this PR at the end.

@tanelk is this the only bug we found so far? https://issues.apache.org/jira/browse/SPARK-32110

@tanelk

tanelk commented Aug 24, 2020

Copy link
Copy Markdown
Contributor Author

This is a good catch! Let's fix the found bugs one by one and merge this PR at the end.

@tanelk is this the only bug we found so far? https://issues.apache.org/jira/browse/SPARK-32110

Currently yes, the -0.0 inconsistency is the only bug.
I would add to the jira that there is also inconsistency in the code gen and the interpreted code paths: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127791/testReport/org.apache.spark.sql.catalyst.expressions/PredicateSuite/AND__OR__EqualTo__EqualNullSafe_consistency_check/

@tanelk tanelk changed the title [WIP][SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 24, 2020
@SparkQA

SparkQA commented Aug 24, 2020

Copy link
Copy Markdown

Test build #127840 has finished for PR 29515 at commit 9a3b983.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk

tanelk commented Aug 24, 2020

Copy link
Copy Markdown
Contributor Author

Test build #127840 has finished for PR 29515 at commit 9a3b983.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

org.scalatest.exceptions.TestFailedException: Incorrect evaluation (codegen off): [false,0.0] IN ([true,Infinity],[true,7.328233268583497E27],[false,-0.0],[true,2.28620591956394208E17],[true,-1.7155423374193937E-160],[null,7.853276721068246E-75],[null,Infinity],[true,null],[false,null]), actual: false, expected: true

This is a new finding. But also related to the 0.0 == -0.0.

@SparkQA

SparkQA commented Sep 4, 2020

Copy link
Copy Markdown

Test build #128299 has finished for PR 29515 at commit 529aba8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk

tanelk commented Sep 11, 2020

Copy link
Copy Markdown
Contributor Author

@cloud-fan, now that #29647 is merged, can this be merged also?

@maropu

maropu commented Sep 15, 2020

Copy link
Copy Markdown
Member

@cloud-fan, now that #29647 is merged, can this be merged also?

Are all the bugs that this PR found already fixed now?

@maropu

maropu commented Sep 15, 2020

Copy link
Copy Markdown
Member

retest this please

@tanelk

tanelk commented Sep 15, 2020

Copy link
Copy Markdown
Contributor Author

@cloud-fan, now that #29647 is merged, can this be merged also?

Are all the bugs that this PR found already fixed now?

I believe, that they were the manifestation of the 0.0 != -0.0

@SparkQA

SparkQA commented Sep 15, 2020

Copy link
Copy Markdown

Test build #128688 has finished for PR 29515 at commit 529aba8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu

maropu commented Sep 15, 2020

Copy link
Copy Markdown
Member

retest this please

@maropu maropu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu maropu changed the title [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double [SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double Sep 15, 2020
@SparkQA

SparkQA commented Sep 15, 2020

Copy link
Copy Markdown

Test build #128707 has finished for PR 29515 at commit 529aba8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan

Copy link
Copy Markdown
Contributor

hmm, does this PR catch https://issues.apache.org/jira/browse/SPARK-32110 ?

@tanelk

tanelk commented Sep 15, 2020

Copy link
Copy Markdown
Contributor Author

hmm, does this PR catch https://issues.apache.org/jira/browse/SPARK-32110 ?

That's more of an umbrella jira, that mainly covers inconsistency between operators.
Here we uncovered an inconsistency within the equality operators codegen and non-codegen paths.

Just to be safe, we could trigger some retests on this.

@maropu

maropu commented Sep 15, 2020

Copy link
Copy Markdown
Member

retest this please

@SparkQA

SparkQA commented Sep 15, 2020

Copy link
Copy Markdown

Test build #128717 has finished for PR 29515 at commit 529aba8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 6051755 Sep 16, 2020
maropu pushed a commit that referenced this pull request Sep 16, 2020
…loat and double

### What changes were proposed in this pull request?

The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too.

These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`.

### Why are the changes needed?

The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in #29495 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally reverted #29495 and verified that the existing test cases caught the bug.

Closes #29515 from tanelk/SPARK-32688.

Authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 6051755)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu

maropu commented Sep 16, 2020

Copy link
Copy Markdown
Member

Merged to master/3.0. To check if no flaky tests happens, I will keep watching Jenkins jobs. Anyway, thanks, @tanelk !

holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…loat and double

### What changes were proposed in this pull request?

The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too.

These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`.

### Why are the changes needed?

The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in apache#29495 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally reverted apache#29495 and verified that the existing test cases caught the bug.

Closes apache#29515 from tanelk/SPARK-32688.

Authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 6051755)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants