bug fix: Fix fuzz testcase for cast string to integer #450
Conversation
kazuyukitanimura
left a comment
There was a problem hiding this comment.
LGTM pending CI
The title should start with fix: or test:, I think
| castTest( | ||
| gen | ||
| .generateStrings(dataSize, numericPattern, 8) | ||
| .toDF("a") | ||
| .withColumn("a", functions.trim($"a")), | ||
| DataTypes.IntegerType) |
There was a problem hiding this comment.
I'd prefer to see us fix the compatibility issue rather than skip testing strings that have leading and/or trailing whitespace. I have been looking into this and was going to make some comments here but it turned out to be a bit more involved that I thought it would, so will create a PR soon with my proposed fix.
There was a problem hiding this comment.
The main issue is that we are trimming the input before processing, so error messages use the trimmed input instead of the original input. Once I resolved that I saw that we had extra processing for leading and trailing whitespace that is never used (ported from Spark) so I ended up removing that for some performance wins.
Which issue does this PR close?
Closes #431 .
Rationale for this change
Removing leading whitespaces: In some inputs, Spark's error messages contain leading whitespaces while Comet's do not, causing the tests to fail. By removing the leading whitespaces from the input, this issue is resolved.
What changes are included in this PR?
removing the leading whitespaces from input df.
How are these changes tested?
Fuzz tests pass.