Describe the bug
Our hash implementation does not produce the same results as Spark for some inputs.
I added this test to CometCastSuite because that's where we have random data generators (we should move them into a common class that more test suites can use).
test("hash") {
val input = generateStrings(timestampPattern, 8).toDF("a")
withTempPath { dir =>
val data = roundtripParquet(input, dir).coalesce(1)
data.createOrReplaceTempView("t")
val df = spark.sql(s"select a, hash(a) from t order by a")
checkSparkAnswerAndOperator(df)
}
}
Example output:
!== Correct Answer - 1000 == == Spark Answer - 1000 ==
struct<a:string,hash(a):int> struct<a:string,hash(a):int>
![,142593372] [,0]
![ 099,-1611881412] [ 099,-881749019]
![ 1 474,240523873] [ 1 474,-1111423867]
![ 12852,-1057581169] [ 12852,-404859411]
![ 18,-492750382] [ 18,1333608017]
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
Our
hashimplementation does not produce the same results as Spark for some inputs.I added this test to
CometCastSuitebecause that's where we have random data generators (we should move them into a common class that more test suites can use).Example output:
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response