[SPARK-31171][SQL] size(null) should return null under ansi mode#27936
[SPARK-31171][SQL] size(null) should return null under ansi mode#27936cloud-fan wants to merge 2 commits into
Conversation
| def legacySizeOfNull: Boolean = getConf(SQLConf.LEGACY_SIZE_OF_NULL) | ||
| def legacySizeOfNull: Boolean = { | ||
| // size(null) should return null under ansi mode. | ||
| getConf(SQLConf.LEGACY_SIZE_OF_NULL) && !getConf(ANSI_ENABLED) |
There was a problem hiding this comment.
Thanks, ansi conf overrides legacy conf ? Do we have another legacy conf like this? Is it consistent with the other legacy? Could you update the migration guide and conf document to be consistent, too?
cc @gatorsmile
There was a problem hiding this comment.
This is the first one, and I don't think we need a migration guide as the behavior doesn't change by default.
I'll update the config doc.
|
Test build #119932 has finished for PR 27936 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you so much, @cloud-fan .
Merged to master/3.0.
This already passed the Jenkins test and the last commit is only Config doc change and I verified like the following.
scala> org.apache.spark.sql.internal.SQLConf.LEGACY_SIZE_OF_NULL.doc
res16: String = If it is set to false, or spark.sql.ansi.enabled is true, then size of null returns null. Otherwise, it returns -1, which was inherited from Hive.Make `size(null)` return null under ANSI mode, regardless of the `spark.sql.legacy.sizeOfNull` config. In #27834, we change the result of `size(null)` to be -1 to match the 2.4 behavior and avoid breaking changes. However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of `size(null)` is a good fit of the ANSI mode. No as ANSI mode is off by default. new tests Closes #27936 from cloud-fan/null. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit dc5ebc2) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
|
Late LGTM. Btw, would it be better to update function javadoc as well? |
|
Test build #119941 has finished for PR 27936 at commit
|
|
+1 for @HeartSaVioR 's comment. |
|
Maybe worth to mention here as well - https://github.com/apache/spark/blob/master/docs/sql-ref-ansi-compliance.md |
|
Yea, as @HeartSaVioR suggested above, I also think we need to update the doc. |
|
Will open a followup soon, thanks for taking a look! |
### What changes were proposed in this pull request? A followup of #27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27950 from cloud-fan/null. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request? A followup of #27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27950 from cloud-fan/null. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 8643e5d) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request? Make `size(null)` return null under ANSI mode, regardless of the `spark.sql.legacy.sizeOfNull` config. ### Why are the changes needed? In apache#27834, we change the result of `size(null)` to be -1 to match the 2.4 behavior and avoid breaking changes. However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of `size(null)` is a good fit of the ANSI mode. ### Does this PR introduce any user-facing change? No as ANSI mode is off by default. ### How was this patch tested? new tests Closes apache#27936 from cloud-fan/null. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? A followup of apache#27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes apache#27950 from cloud-fan/null. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
Make
size(null)return null under ANSI mode, regardless of thespark.sql.legacy.sizeOfNullconfig.Why are the changes needed?
In #27834, we change the result of
size(null)to be -1 to match the 2.4 behavior and avoid breaking changes.However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of
size(null)is a good fit of the ANSI mode.Does this PR introduce any user-facing change?
No as ANSI mode is off by default.
How was this patch tested?
new tests